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Preface (Part II) 



This book, Part 3 - Operators and Tensors, covers Chapters 9 through 12 of the book A Com- 
prehensive Introduction to Linear Algebra (Addison- Wesley, 1986), by Joel G. Broida and S. Gill 
Williamson. Selections from Chapters 9 and 10 are covered in most upper division courses in 
linear algebra. Chapters 11 and 12 introduce multilinear algebra and Hilbert space. The orig- 
inal Preface, Contents and Index are included. Three appendices from the original manuscript 
are included as well as the original Bibliography. The latter is now (2012) mostly out of date. 
Wikipedia articles on selected subjects are generally very informative. 
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Preface (Parts I, II, III) 



As a text, this book is intended for upper division undergraduate and begin- 
ning graduate students in mathematics, applied mathematics, and fields of 
science and engineering that rely heavily on mathematical methods. However, 
it has been organized with particular concern for workers in these diverse 
fields who want to review the subject of linear algebra. In other words, we 
have written a book which we hope will still be referred to long after any final 
exam is over. As a result, we have included far more material than can possi- 
bly be covered in a single semester or quarter. This accomplishes at least two 
things. First, it provides the basis for a wide range of possible courses that can 
be tailored to the needs of the student or the desire of the instructor. And 
second, it becomes much easier for the student to later learn the basics of 
several more advanced topics such as tensors and infinite-dimensional vector 
spaces from a point of view coherent with elementary linear algebra. Indeed, 
we hope that this text will be quite useful for self-study. Because of this, our 
proofs are extremely detailed and should allow the instructor extra time to 
work out exercises and provide additional examples if desired. 
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PREFACE 



A major concern in writing this book has been to develop a text that 
addresses the exceptional diversity of the audience that needs to know some- 
thing about the subject of linear algebra. Although seldom explicitly 
acknowledged, one of the central difficulties in teaching a linear algebra 
course to advanced students is that they have been exposed to the basic back- 
ground material from many different sources and points of view. An experi- 
enced mathematician will see the essential equivalence of these points of 
view, but these same differences seem large and very formidable to the 
students. An engineering student for example, can waste an inordinate amount 
of time because of some trivial mathematical concept missing from their 
background. A mathematics student might have had a concept from a different 
point of view and not realize the equivalence of that point of view to the one 
currently required. Although such problems can arise in any advanced mathe- 
matics course, they seem to be particularly acute in linear algebra. 

To address this problem of student diversity, we have written a very self- 
contained text by including a large amount of background material necessary 
for a more advanced understanding of linear algebra. The most elementary of 
this material constitutes Chapter 0, and some basic analysis is presented in 
three appendices. In addition, we present a thorough introduction to those 
aspects of abstract algebra, including groups, rings, fields and polynomials 
over fields, that relate directly to linear algebra. This material includes both 
points that may seem "trivial" as well as more advanced background material. 
While trivial points can be quickly skipped by the reader who knows them 
already, they can cause discouraging delays for some students if omitted. It is 
for this reason that we have tried to err on the side of over- explaining 
concepts, especially when these concepts appear in slightly altered forms. The 
more advanced reader can gloss over these details, but they are there for those 
who need them. We hope that more experienced mathematicians will forgive 
our repetitive justification of numerous facts throughout the text. 

A glance at the Contents shows that we have covered those topics nor- 
mally included in any linear algebra text although, as explained above, to a 
greater level of detail than other books. Where we differ significantly in con- 
tent from most linear algebra texts however, is in our treatment of canonical 
forms (Chapter 8), tensors (Chapter 11), and infinite-dimensional vector 
spaces (Chapter 12). In particular, our treatment of the Jordan and rational 
canonical forms in Chapter 8 is based entirely on invariant factors and the 
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Smith normal form of a matrix. We feel this approach is well worth the effort 
required to learn it since the result is, at least conceptually, a constructive 
algorithm for computing the Jordan and rational forms of a matrix. However, 
later sections of the chapter tie together this approach with the more standard 
treatment in terms of cyclic subspaces. Chapter 1 1 presents the basic formal- 
ism of tensors as they are most commonly used by applied mathematicians, 
physicists and engineers. While most students first learn this material in a 
course on differential geometry, it is clear that virtually all the theory can be 
easily presented at this level, and the extension to differentiable manifolds 
then becomes only a technical exercise. Since this approach is all that most 
scientists ever need, we leave more general treatments to advanced courses on 
abstract algebra. Finally, Chapter 12 serves as an introduction to the theory of 
infinite-dimensional vector spaces. We felt it is desirable to give the student 
some idea of the problems associated with infinite- dimensional spaces and 
how they are to be handled. And in addition, physics students and others 
studying quantum mechanics should have some understanding of how linear 
operators and their adjoints are properly defined in a Hilbert space. 

One major topic we have not treated at all is that of numerical methods. 
The main reason for this (other than that the book would have become too 
unwieldy) is that we feel at this level, the student who needs to know such 
techniques usually takes a separate course devoted entirely to the subject of 
numerical analysis. However, as a natural supplement to the present text, we 
suggest the very readable "Numerical Analysis" by I. Jacques and C. Judd 
(Chapman and Hall, 1987). 

The problems in this text have been accumulated over 25 years of teaching 
the subject of linear algebra. The more of these problems that the students 
work the better. Be particularly wary of the attitude that assumes that some of 
these problems are "obvious" and need not be written out or precisely articu- 
lated. There are many surprises in the problems that will be missed from this 
approach! While these exercises are of varying degrees of difficulty, we have 
not distinguished any as being particularly difficult. However, the level of dif- 
ficulty ranges from routine calculations that everyone reading this book 
should be able to complete, to some that will require a fair amount of thought 
from most students. 

Because of the wide range of backgrounds, interests and goals of both 
students and instructors, there is little point in our recommending a particular 
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course outline based on this book. We prefer instead to leave it up to each 
teacher individually to decide exactly what material should be covered to meet 
the needs of the students. While at least portions of the first seven chapters 
should be read in order, the remaining chapters are essentially independent of 
each other. Those sections that are essentially applications of previous 
concepts, or else are not necessary for the rest of the book are denoted by an 
asterisk (*). 

Now for one last comment on our notation. We use the symbol I to denote 
the end of a proof, and / to denote the end of an example. Sections are labeled 
in the format "Chapter.Section," and exercises are labeled in the format 
"Chapter.Section. Exercise." For example, Exercise 2.3.4 refers to Exercise 4 
of Section 2.3, i.e., Section 3 of Chapter 2. Books listed in the bibliography 
are referred to by author and copyright date. 
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CHAPTER 9 



Linear Forms 



We are now ready to elaborate on the material of Sections 2.4, 2.5 and 5.1. 
Throughout this chapter, the field J will be assumed to be either the real or 
complex number system unless otherwise noted. 

9.1 BILINEAR FUNCTIONALS 

Recall from Section 5.1 that the vector space V* = L(V, f): V -> f is defined 
to be the space of linear functionals on V. In other words, if § G V*, then for 
every u, v G V and a,bGf we have 

(|)(au + bv) = a(|)(u) + b(|)(v) G J . 

The space V* is called the dual space of V. If V is finite-dimensional, then 
viewing f as a one-dimensional vector space (over f), it follows from 
Theorem 5.4 that dim V* = dim V. In particular, given a basis {ej for V, the 
proof of Theorem 5.4 showed that a unique basis {w 1 } for V* is defined by the 
requirement that 

where we now again use superscripts to denote basis vectors in the dual space. 
We refer to the basis {co 1 } for V* as the basis dual to the basis {ej} for V. 
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Elements of V* are usually referred to as 1- forms, and are commonly denoted 
by Greek letters such as a, (|), and so forth. Similarly, we often refer to the oo 1 
as basis 1-forms. 

Since applying Theorem 5.4 to the special case of V* directly may be 
somewhat confusing, let us briefly go through a slightly different approach to 
defining a basis for V*. 

Suppose we are given a basis {e 1; . . . , e n } for a finite-dimensional vector 
space V. Given any set of n scalars (j)j, we define the linear functional (j) £ V* 
= L(V, f) by <j>(ei) = (jv According to Theorem 5.1, this mapping is unique. In 

particular, we define n linear functionals co 1 by co 1 ^) = b\. Conversely, given 
any linear functional § £ V*, we define the n scalars ^ by (j>i = <^(Ci). Then, 
given any (j)GV* and any v = 2v J ej £ V, we have on the one hand 

<|>(v) = (KSVeO = Zvfyfo) = S^.v 1 
while on the other hand 

(o\w) = coXSjvJ ej ) = SjvW ( ej ) = X \ J cV = v 1 . 

Therefore (|)(v) = Si^ooXv) for any v £ V, and we conclude that § = Si^oo 1 . 

This shows that the oo 1 span V*, and we claim that they are in fact a basis for 
V*. 

To show that the oo 1 are linearly independent, suppose Si^oo 1 = 0. We must 
show that every a ; = 0. But for any j = 1, . . . , n we have 

= XaiOoXej) = 2 i a i 5 1 j = a, 

which verifies our claim. This completes the proof that {co 1 } forms a basis for 
V*. 

There is another common way of denoting the action of V* on V that is 
quite similar to the notation used for an inner product. In this approach, the 
action of the dual basis {oo 1 } for V* on the basis {e s } for V is denoted by 
writing co^e,) as 

(oo 1 , ej ) = S 1 . 

However, it should be carefully noted that this is not an inner product. In par- 
ticular, the entry on the left inside the bracket is an element of V*, while the 
entry on the right is an element of V. Furthermore, from the definition of V* 
as a linear vector space, it follows that ( , } is linear in both entries. In other 
words, if ()), £ V*, and if u, v £ V and a, b £ ( f, we have 
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(afy + bd, u) = a(<p, u) + b(d, u) 
((j), au + bv} = a((f), u^ + b{(j), . 

These relations define what we shall call a bilinear functional ( , ): V* x 

V — *■ J on V* and V (compare this with definition IP1 of an inner product 
given in Section 2.4). 

We summarize these results as a theorem. 

Theorem 9.1 Let {e^ . . . , e n } be a basis for V, and let {oo 1 , . . . , oo 11 } be the 

corresponding dual basis for V* defined by co^e,-) = 5*. Then any v G V can 
be written in the forms 

n n n 

v =^ v ' e i = (v) ^- = v ) e i 

i=l i=\ (=1 

and any (|)£V* can be written as 

n n n 

;'=1 (=1 i'=l 

This theorem provides us with a simple interpretation of the dual basis. In 
particular, since we already know that any v G V has the expansion v = SVej 
in terms of a basis {e^, we see that (o\v) = (oo 1 , v) = v 1 is just the ith coord- 
inate of v. In other words, oo 1 is just the ith coordinate function on V (relative 
to the basis {ei}). 

Let us make another observation. If we write v = Zv'ej and recall that 
(j)(ei) = (j)i, then (as we saw above) the linearity of (|) results in 

v) = <Kv) = (KSVeO = SVcKeO = Zfcv 1 

which looks very much like the standard inner product on R n . In fact, if V is 
an inner product space, we shall see that the components of an element § G 
V* may be related in a direct way to the components of some vector in V (see 
Section 11.10). 

It is also useful to note that given any nonzero v G V, there exists (j)GV* 
with the property that (|)(v) ^ 0. To see this, we use Theorem 2.10 to first 
extend v to a basis {v, v 2 , . . . , v n } for V. Then, according to Theorem 5.1, 
there exists a unique linear transformation V -* J such that (|)(v) = 1 and 
(j)(Vi) = for i = 2, . . . , n. This (|) so defined clearly has the desired property. 
An important consequence of this comes from noting that if v b v 2 G V with 
v, ^ v 2 , then V[ - v 2 ^ 0, and thus there exists (|) G V* such that 
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* (Kvj - v 2 ) = ^(v,) - (|)(v 2 ) . 
This proves our next result. 

Theorem 9.2 If V is finite- dimensional and v l5 v 2 E V with V[ ^ v 2 , then 
there exists (|)£V* with the property that cj)(v,) ^ ())(v 2 ). 



Example 9.1 Consider the space V = R 2 consisting of all column vectors of 
the form 



v = 



Relative to the standard basis we have 



v = v 



0; 



+ v 



1 



= v e l + v e 2 



If (|) G V*, then (|)(v) = S^v 1 , and we may represent § by the row vector (|) 
(§h §2)- In particular, if we write the dual basis as co 1 = (a s , b ; ), then we have 



1 = oj ] 0, ) =(a 1 ,b 1 )\ 



1 



= a, 



i (o\ 

= co (e 2 ) = (a l , fcj) I =^ 



= « ( ei ) = (a 2 , d 2 ) 



l = ty 2 (e 2 ) = (a 2 , & 2 )r J =b 2 



so that co 1 = (1, 0) and oo z = (0, 1). Note that, for example, 



« 1 (v) = (l, 0) 



v 



= V 



as it should. / 
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Exercises 

1. Find the basis dual to the given basis for each of the following: 

(a) R 2 with basis ei = (2, 1), e 2 = (3, 1). 

(b) R 3 with basis ei = (1, -1, 3), e 2 = (0, 1, -1), e 3 = (0, 3, -2). 

2. Let V be the space of all real polynomials of degree < 1. Define go 1 , go 2 G 
V* by 

(o\f) = flf(x)dx and co 2 (/) = f*f(x)dx . 
Find a basis {ei, e 2 } for V that is dual to {go 1 , go 2 }. 

3. Let V be the vector space of all polynomials of degree < 2. Define the 
linear functionals go 1 , go 2 , go 3 G V* by 

co\f) = f l f(.x)dx, co 2 (f) = /'(l), 0)\f) = /(0) 
where f'(x) is the usual derivative of f(x). Find the basis {ej for V which 
is dual to {go 1 }. 

4. (a) Let u, v G V and suppose that <|)(u) = implies <|)(v) = for all (j)£V*. 
Show that v = ku for some scalar k. 

(b) Let <|), a G V* and suppose that (|)(v) = implies a(v) = for all v G 
V. Show that o = k§ for some scalar k. 

5. Let V = J[x], and for a£f, define ^ V -» 7 by ()) a (f ) = f(a). Show that: 

(a) (|) a is linear, i.e., that (|) a GV*. 

(b) If a * b, then (|) a ^ (j> b . 

6. Let V be finite-dimensional and W a subspace of V. If § G W*, prove that 
(|) can be extended to a linear functional $GV*, i.e., <&(w) = (|)(w) for all 
wGW. 



9.2 DOUBLE DUALS AND ANNIHILATORS 

We now discuss the similarity between the dual space and inner products. To 
elaborate on this relationship, let V be finite-dimensional over the real field R 
with an inner product ( , }: V x V -*■ J defined on it. (There should be no 
confusion between the inner product on V and the action of a bilinear func- 
tional on V* x V because both entries in the inner product expressions are 
elements of V.) In fact, throughout this section we may relax our definition of 
inner product somewhat as follows. Referring to our definition in Section 2.4, 
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we keep properties (IP1) and (IP2), but instead of (IP3) we require that if u £ 
V and (u, v) = for all v £ V, then u = 0. Such an inner product is said to be 
nondegenerate. The reader should be able to see easily that (IP3) implies 
nondegeneracy, and hence all inner products we have used so far in this book 
have been nondegenerate. (In Section 11.10 we will see an example of an 
inner product space with the property that (u, u) = for some u ^ 0.) 

If we leave out the second vector entry in the inner product (u, ), then 
what we have left is essentially a linear functional on V. In other words, given 
any u £ V, we define a linear functional L u £ V* by 

L u (v) = (u, v) 

for all v £ V. From the definition of a (real) inner product, it is easy to see that 
this functional is indeed linear. Furthermore, it also has the property that 

Lau+ bv = &L U + bL v 

for all u, v £ V and a, b £ J. What we have therefore done is define a linear 
mapping L: V -* V* by L(u) = L u for all u £ V. Since the inner product is 
nondegenerate, we see that if u ^ then L u (v) = (u, v) can not vanish for all 
v £ V, and hence L u ^ 0. This means that Ker L = {0}, and hence the 
mapping must be one-to-one (Theorem 5.5). But both V and V* are of 
dimension n, and therefore this mapping is actually an isomorphism of V onto 
V*. This proves our next theorem. 

Theorem 9.3 Let V be finite-dimensional over R, and assume that V has a 
nondegenerate inner product defined on it. Then the mapping u >-» L u is an 
isomorphism of V onto V*. 

Looking at this isomorphism as a mapping from V* onto V, we can 
reword this theorem as follows. 

Corollary Let V be as in Theorem 9.3. Then, given any linear functional 
L £ V*, there exists a unique u £ V such that L(v) = (u, v) = L u (v) for all v £ 
V. In other words, given any L £ V*, there exists a unique u £ V such that L u 
= L. 

Note that if V is a vector space over C with the more general Hermitian 
inner product defined on it, then the definition L u (v) = (u, v) shows that L au = 
a*L u , and the mapping u >-» L u is no longer an isomorphism of V onto V*. 
Such a mapping is not even linear, and is in fact called antilinear (or conju- 
gate linear). We will return to this more general case later. 
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Let us now consider vector spaces V and V* over an arbitrary (i.e., possi- 
bly complex) field c f. Since V* is a vector space, we can equally well define 
the space of linear functionals on V*. By a procedure similar to that followed 
above, the expression ( , u) for a fixed u G V defines a linear functional on V* 
(note that here ( , ) is a bilinear functional and not an inner product). In other 
words, we define the function f u : V* -*■ J by 

f u (<|>) = (<fc u) = <|)(u) 

for all <|) E V*. It follows that for all a,bGf and (j), co E V* we have 

f u (a(J> + boo) = (a<|) + bco, u) = a((|), u) + b(co, u) = af u ((J>) + bf u (co) 

and hence f u is a linear functional from V* to J. In other words, f u is in the 
dual space of V*. This space is called the double dual (or second dual) of V, 
and is denoted by V**. 

Note that Theorem 9.3 shows us that V* is isomorphic to V for any finite- 
dimensional V, and hence V* is also finite-dimensional. But then applying 
Theorem 9.3 again, we see that V** is isomorphic to V*, and therefore V is 
isomorphic to V**. Our next theorem verifies this fact by explicit construction 
of an isomorphism from V onto V**. 

Theorem 9.4 Let V be finite-dimensional over an d for each u E V define 
the function f u : V* — > J by f u (())) = ())(u) for all (j)G V*. Then the mapping f: 
u i-» f u is an isomorphism of V onto V**. 

Proof We first show that the mapping f: u >-» f u defined above is linear. For 
any u, v E V and a,b£J we see that 

fau+bvW) -(0. au + bv) 

= a(<p, u^ + b{4>, v) 
= af u {<t>) + bf v {<t>) 
= {af u +bf v ){<t>) . 

Since this holds for all § E V*, it follows that f au+ t> v = a fu + bf v , and hence 
the mapping f is indeed linear (so it defines a vector space homomorphism). 

Now let u E V be an arbitrary nonzero vector. By Theorem 9.2 (with V[ = 
u and v 2 = 0) there exists afGV* such that f u (())) = (()), u) ^ 0, and hence 
clearly f u ^ 0. Since it is obviously true that fo = 0, it follows that Ker f = {0}, 
and thus we have a one-to-one mapping from V into V** (Theorem 5.5). 
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Finally, since V is finite-dimensional, we see that dim V = dim V* = 
dim V**, and hence the mapping f must be onto (since it is one-to-one). I 

The isomorphism f: u >-» f u defined in Theorem 9.4 is called the natural 
(or evaluation) mapping of V into V**. (We remark without proof that even 
if V is infinite-dimensional this mapping is linear and injective, but is not 
surjective.) Because of this isomorphism, we will make the identification V = 
V** from now on, and hence also view V as the space of linear functionals on 
V*. Furthermore, if {oo 1 } is a basis for V*, then the dual basis {ej for V will 
be taken to be the basis for V**. In other words, we may write 

coXej) = e^oo 1 ) = 5* 

so that 

(Kv) = v(<i>) = j^-y . 

Now let S be an arbitrary subset of a vector space V. We call the set of 
elements § G V* with the property that (|)(v) = for all v G S the annihilator 
of S, and we denote it by S°. In other words, 

S° = {<|> G V*: <|>(v) = for all v G S} . 

It is easy to see that S° is a subspace of V*. Indeed, suppose that (|), oo G S°, let 
a,b£f and let v G S be arbitrary. Then 

(a()) + bco)(v) = a())(v) + bco(v) = + = 

so that a(|) + boo G S°. Note also that we clearly have G S°, and if T C S, then 
S°CT°. 

If we let S be the linear span of a subset S C V, then it is easy to see that 
S° = S°. Indeed, if u G S is arbitrary, then there exist scalars a b . . . , a r such 
that u = SajV 1 for some set of vectors {v 1 , . . . , v r } G S. But then for any § G 
S° we have 

<|>(u) = (K^v 1 ) = Sa^v 1 ) = 

and hence (|) G J>°. Conversely, if (|) G J>° then (|) annihilates every v G S and 

hence (|) G S°. The main conclusion to deduce from this observation is that to 
find the annihilator of a subspace W of V, it suffices to find the linear func- 
tionals that annihilate any basis for W (see Example 9.2 below). 
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Just as we talked about the second dual of a vector space, we may define 
the space S 00 in the obvious manner by 

s oo _ ^0)0 _ {v e v: (|)(v) = for all $ E S } . 

This is allowed because of our identification of V and V** under the isomor- 
phism u i-» f u . To be precise, note that if v E S C V is arbitrary, then for any 

4 E S° we have f v (<|>) = ())(v) = 0, and hence f v E (S°)° = S 00 But by our 
identification of v and f v (i.e., the identification of V and V**) it follows that 

v E S 00 , and thus S C S 00 . If S happens to be subspace of V, then we can in 
fact say more than this. 

Theorem 9.5 Let V be finite-dimensional and W a subspace of V. Then 

(a) dim W° = dim V - dim W. 

(b) W 00 = W. 

Proof (a) Assume that dim V = n and dim W = m < n. If we choose a basis 
{w 1; . . . , w m } for W, then we may extend this to a basis 

{w„ . . . , w m , Vi, . . . , v n _ m } 

for V (Theorem 2.10). Corresponding to this basis for V, we define the dual 
basis 

{(I) 1 , . . . , (r, e 1 , . . . , e n - m } 

for V*. By definition of dual basis we then have Q\Wj) = S* and = for 

all w,-. This shows that 6 1 E W° for each i = 1, . . . , n - m. We claim that {6 1 } 
forms a basis for W°. 

Since each 1 is an element of a basis for V*, the set {0 1 } must be linearly 
independent. Now let a E W° be arbitrary. Applying Theorem 9.1 (and 
remembering that Wj E W) we have 

m n-m n-m 

i-1 j=\ j-1 

This shows that the 1 also span W°, and hence they form a basis for W°. 

Therefore dim W° = n - m = dim V - dim W. 

(b) Recall that the discussion preceding this theorem showed that W C 
W 00 . To show that W = W 00 , we need only show that dim W = dim W 00 . 
However, since W° is a subspace of V* and dim V* = dim V, we may apply 
part (a) to obtain 
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dim W 00 = dim V* - dim W° 

= dimV*-(dimV-dimW) 
= dimW . I 

Example 9.2 Let W C R 4 be the two-dimensional subspace spanned by the 
(column) vectors Wj = (1, 2, -3, 4) and w 2 = (0, 1, 4, -1). To find a basis for 
W°, we seek dim W° = 4-2 = 2 independent linear functionals (|) of the form 
(|)(x, y, z, t) = ax + by + cz + dt such that ^(w,) = (t>(w 2 ) = 0. (This is just 

(j)(w) = S^jW 1 where w = (x, y, z, t) and (|) = (a, b, c, d).) This means that we 
must solve the set of linear equations 

0(1, 2, -3, 4) = a + 2b-3c + 4t = 
0(0,1,4,-1)= b + 4c- t=0 

which are already in row-echelon form with c and t as free variables (see 
Section 3.5). We are therefore free to choose any two distinct sets of values 
we like for c and t in order to obtain independent solutions. 

If we let c = 1 and t = 0, then we obtain a = 1 1 and b = -4 which yields the 

linear functional (^(x, y, z, t) = 1 lx - 4y + z. If we let c = and t = 1, then we 
obtain a = -6 and b = 1 so that ()) 2 (x, y, z, t) = -6x + y + t. Therefore a basis 
for W° is given by the pair § 2 }. In component form, these basis (row) 
vectors are simply 

^=(11, -4, 1, 0) 
2 =(-6, 1,0,1) . // 

This example suggests a general approach to finding the annihilator of a 
subspace W of ^F n . To see this, first suppose that we have m < n linear equa- 
tions in n unknowns: 

n 

7=1 

for each i = 1, . . . , m. If we define the m linear functionals (j) 1 by 

n 

••• . X n) = ^ a ij X j 
7=1 

then we see that the solution space of our system of equations is nothing more 
than the subspace of that is annihilated by {(j) 1 }. Recalling the material of 
Section 3.5, we know that the solution space to this system is found by row- 
reducing the matrix A = (a^). Note also that the row vectors A s are just the 
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coordinates of the linear functional (j) 1 relative to the basis of ^F n * that is dual 
to the standard basis for ^F n . 

Now suppose that for each i = 1, . . . , m we are given the vector n- tuple 
Vj = (a u , . . . , a in ) G What we would like to do is find the annihilator of 
the subspace W C f n that is spanned by the vectors v s . From the previous 
section (and the above example) we know that any linear functional § on ^ n 
must have the form (^(x,, . . . , x n ) = 2i n =iCjXj, and hence the annihilator we 
seek satisfies the condition 

n 

0(v,. ) = (j)(a n , . . . , a in ) = J a ijCj = 

7=1 

for each i = 1, . . . , m. In other words, the annihilator (c b . . . , c n ) is a solution 
of the homogeneous system 

n 

Example 9.3 Let WCR 5 be spanned by the four vectors 

v, =(2,-2, 3, 4,-1) v 2 =(-l, 1,2, 5,2) 

v 3 = (0, 0, -1, -2, 3) v 4 = (1, -1, 2, 3, 0) . 

Then W° is found by row-reducing the matrix A whose rows are the basis 
vectors of W: 

'2 -2 3 4 -V 
-112 5 2 

-1 -2 3 ' 
v 1 -1 2 3 0, 

Using standard techniques, the reduced matrix is easily found to be 

'I -1 -1 s 

12 

1' 

v 0, 

This is equivalent to the equations 

q-c 2 - c 4 =0 
c 3 + 2c 4 = 
c 5 =0 
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and hence the free variables are c 2 and C4. Note that the row-reduced form of 
A shows that dim W = 3, and hence dim W° = 5-3 = 2. Choosing c 2 = 1 and 
c 4 = yields c x = 1 and C3 = 0, and hence one of the basis vectors for W° is 
given by (j) 1 = (1, 1, 0, 0, 0). Similarly, choosing c 2 = and C4 = 1 results in the 
other basis vector § 2 = (1, 0, -2, 1, 0). / 



Exercises 

1. Let U and W be subspaces of V (which may be infinite-dimensional). 
Prove that: 

(a) (U + W)° = u° n w° 

(b) (u n w)° = u° + w° 

Compare with Exercise 2.5.2. 

2. Let V be finite-dimensional and W a subspace of V. Prove that W* is iso- 
morphic to V*/W° and (independently of Theorem 9.5) also that 

dim W° = dim V - dim W . 

[Hint: Consider the mapping T: V* — > W* defined by T(|) = ()) w where (|) w 
is the restriction of <|) E V* to W. Show that T is a surjective linear 
transformation and that Ker T = W°. Now apply Exercise 1.5.11 and 
Theorems 5.4 and 7.34.] 

3. Let V be an n-dimensional vector space. An (n - l)-dimensional subspace 
of V is said to be a hyperspace (or hyperplane). If W is an m- 
dimensional subspace of V, show that W is the intersection of n - m 
hyperspaces in V. 

4. Let U and W be subspaces of a finite-dimensional vectors space V. Prove 
that U = W if and only if U° = W° 

5. Let {ei, . . . , es} be the standard basis for R 5 , and let W C R 5 be spanned 
by the three vectors 

w l = e l + 2e 2 + e 3 

w 2 = e 2 + 3e 3 + 3e 4 + e 5 

w 3 = e l + 4e 2 + 6e 3 + 4e 4 + e 5 . 

Find a basis for W°. 
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9.3 THE TRANSPOSE OF A LINEAR TRANSFORMATION 

Suppose U and V are vector spaces over a field ^f, and let U* and V* be the 
corresponding dual spaces. We will show that any T G L(U, V) induces a 
linear transformation T* G L(V*, U*) in a natural way. We begin by recalling 
our discussion in Section 5.4 on the relationship between two bases for a 
vector space. In particular, if a space V has two bases {e^ and {Bj}, we seek 
the relationship between the corresponding dual bases {co 1 } and {co 1 } for V*. 
This is given by the following theorem. 

Theorem 9.6 Let {ej} and {Bj} be two bases for a finite-dimensional vector 
space V, and let {co 1 } and {co 1 } be the corresponding dual bases for V*. If P is 
the transition matrix from the basis {ej} to the basis {Bj}, then (P~') T is the 
transition matrix from the {co 1 } basis to the {co 1 } basis. 

Proof Let dim V = n. By definition of P = (p ;j ) we have 

n 
7-1 

for each i = 1, . . . , n. Similarly, let us define the (transition) matrix Q = (q^) 
by the requirement that 

n 

7=1 

We must show that Q = (P"') T - To see this, first note that the ith column of Q 

is Q 1 = (q i; , . . . , q ni ) and the jth row of P T is P T j = (p T ji, . . . , p T jn)- From the 
definition of dual bases, we then see that 

= ^k, rlkiPrjd' '% = ^kQkiPkj = ^kP T jkQki 

= (P T Q)ji ■ 

In other words, P T Q = I. Since P is a transition matrix it is nonsingular, and 
hence this shows that Q = (P 1 )" 1 = (P-') T (Theorem 3.21, Corollary 4). I 

Now suppose that T G L(V, U). We define a mapping T*: U* -» V* by 
the rule 

T*<|) = (j)°T 

for all G U*. (The mapping T* is frequently written T l .) In other words, for 
any v G V we have 
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(T*<|>)(v) = (<|>oT)(v) = ())(T(v)) G^F . 

To show that T*(|) is indeed an element of V*, we simply note that for v,, v 2 G 
V and a,b£f we have (using the linearity of T and 

(T*(p)(av l +bv 2 ) = (p(T(av l +bv 2 )) 

= <t>{aT( Vl ) + bT{v 2 )) 
= a(/)(T(v 1 )) + b(P(T(v 2 )) 

= a(r*0)(v 1 )+&(r*0)(v 2 ) 

(this also follows directly from Theorem 5.2). Furthermore, it is easy to see 
that the mapping T* is linear since for any (|), G U* and a,b£f we have 

T*(sk|> + b6) = (sk|> + b0) o T = a(<|> ° T) + b(8 ° T) = a(T*<|)) + b(T*0) . 

Hence we have proven the next result. 

Theorem 9.7 Suppose T G L(V, U), and define the mapping T*: U* V* 
by T*<|) = <|) o T for all <fr G U*. Then T* G L(U*, V*). 

The linear mapping T* defined in this theorem is called the transpose of 
the linear transformation T. The reason for the name transpose is shown in the 
next theorem. Note that we make a slight change in our notation for elements 
of the dual space in order to keep everything as simple as possible. 

Theorem 9.8 Let T G L(V, U) have matrix representation A = (ay) with 
respect to the bases {v l5 . . . , v m } for V and {u l5 . . . , u n } for U . Let the dual 

spaces V* and U* have the corresponding dual bases {v 1 } and {u 1 }. Then the 
matrix representation of T* G L(U*, V*) with respect to these bases for U* 

and V* is given by A T . 

Proof By definition of A = (a^) we have 

n 

Tv i=2 U J a j' 

for each i = 1, . . . , m. Define the matrix representation B = (by) of T* by 

m 

7=1 

for each i = 1, . . . , n. Applying the left side of this equation to an arbitrary 
basis vector v k , we find 
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(T*u> k = u\Tv k ) = uXZjU^ 



) = ^u 1 . 



(Uj)a jk 



while the right side yields 



2jbjiV J (v k ) = SjbjiSJk = b ki 



Therefore b ki = a ik = a T ki , and thus B = A T . I 

Example 9.4 If T E L(V, U), let us show that Ker T* = (Im T)°. (Remember 
that T*: U* V*.) Let § G Ker T* be arbitrary, so that = T*(|) = (|) °T. If 
uGUis any element in Im T, then there exists v G V such that u = Tv. Hence 



and thus G (Im T)°. This shows that Ker T* C (Im T)°. 

Now suppose G (Im T) so that 0(u) = for all u G Im T. Then for any 
v G V we have 



and hence T*0 = 0. This shows that G Ker T* and therefore (Im T) C 
Ker T*. Combined with the previous result, we see that Ker T* = (Im T)°. / 

Example 9.5 Suppose T G L(V, U) and recall that r(T) is defined to be the 
number dim(Im T). We will show that r(T) = r(T*). From Theorem 9.5 we 
have 



())(u) = ())(Tv) = (T*(|))v = 



(T*0)v = 0(Tv) G O(ImT) = 



dim(Im T)° = dim U - dim(Im T) = dim U - r(T) 



and from the previous example it follows that 



nul T* = dim(Ker T*) = dim(Im T) 



o 



Therefore (using Theorem 5.6) we see that 



r(T*) = dimU* - nuir* = dimf/ - mnT* = dimU - dim(Imr) 
= r(7) . 



,0 
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Exercises 

1. Suppose A E M mxn (^). Use Example 9.5 to give a simple proof that 
rr(A) = cr(A). 

2. Let V = R 2 and define <|> E V* by <|>(x, y) = 3x - 2y. For each of the fol- 
lowing linear transformations T E L(R 3 , R 2 ), find (T*<|))(x, y, z): 

(a) T(x, y, z) = (x + y, y + z). 

(b) T(x, y, z) = (x + y + z, 2x - y). 

3. If S E L(U, V) and T E L(V, W), prove that (T ° S)* = S* ° T*. 

4. Let V be finite-dimensional, and suppose that T E L(V). Show that the 
mapping T >-» T* defines an isomorphism of L(V) onto L(V*). 

5. Let V = R[x], suppose a, b E R are fixed, and define (|) E V* by 

0(/)= f b f(x)dx . 
If D is the usual differentiation operator on V, find D*(|). 

6. Let V = MnCTO, let B E V be fixed, and define T E L(V) by 

T(A) = AB - BA . 
If § E V* is defined by ())(A) = Tr A, find T*<|>. 



9.4 BILINEAR FORMS 

In order to facilitate our treatment of operators (as well as our later discussion 
of the tensor product), it is worth generalizing slightly some of what we have 
done so far in this chapter. Let U and V be vector spaces over ( f. We say that a 
mapping f: U x V — > J is bilinear if it has the following properties for all u b 
u 2 E U, for all v ls v 2 E V and all a, b E J: 

(1) f(auj + bu 2 , vO = af(u„ vO + bf(u 2 , vO. 

(2) f(ui, avj + bv 2 ) = af(u„ v,) + bf(u„ v 2 ). 

In other words, f is bilinear if for each v E V the mapping u >-* f(u, v) is 
linear, and if for each u E U the mapping v >-* f(u, v) is linear. In the 
particular case that V = U, then the bilinear map f: V x V -> f is called a 
bilinear form on V. (Note that a bilinear form is defined on V x V, while a 
bilinear functional was defined on V* x V.) Rather than write expressions like 
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f(u, v), we will sometimes write the bilinear map as (u, v) if there is no need to 
refer to the mapping f explicitly. While this notation is used to denote several 
different operations, the context generally makes it clear exactly what is 
meant. 

We say that the bilinear map f: U x V -*■ J is nondegenerate if f(u, v) = 
for all v G V implies that u = 0, and f(u, v) = for all u E U implies that v = 0. 

Example 9.6 Suppose A = (a^) G M n (^). Then we may interpret A as a 

bilinear form on f n as follows. In terms of the standard basis {ej for F n , any 

XGf may be written as X = 2 x^j , and hence for all X, Y G f n we define 
the bilinear form f a by 

f A (X,Y) = Sija^yJ = X T AY . 

Here the row vector X T is the transpose of the column vector X, and the 
expression X T AY is just the usual matrix product. It should be easy for the 
reader to verify that f a is actually a bilinear form on ^F n . / 

Example 9.7 Suppose a, (3 G V*. Since a and |3 are linear, we may define a 
bilinear form f: V x V — *y by 

f(u, v) = a(u)|3(v) 

for all u, v G V. This form is usually denoted by a ® (3 and is called the 
tensor product of a and (3. In other words, the tensor product of two elements 
a, (3 G V* is defined for all u, v G V by 

(a (3)(u, v) = a(u)(3(v) . 

We may also define the bilinear form g: V x V -^f by 

g(u, v) = a(u)|3(v) - a(v)(3(u) . 

We leave it to the reader to show that this is indeed a bilinear form. The map- 
ping g is usually denoted by a a (3, and is called the wedge product or the 
antisymmetric tensor product of a and (3. In other words 

(oca|3)(u,v) = a(u)|3(v) - a(v)|3(u) . 

Note that a a (3 is just a®(3-(3®a. / 
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Generalizing Example 9.6 leads to the following theorem. 

Theorem 9.9 Given a bilinear map f: ^F m x ^F n — * c f, there exists a unique 
matrix A G M mxn (^) such that f = f a. In other words, there exists a unique 

matrix A such that f(X, Y) = X T AY for all XGf m andYG ^ n . 

Proof In terms of the standard bases for J :m and ^F n , we have the column 
vectors X = Z? = l x 1 e; E f m and Y = 2, n = i y j ej E Using the bilinearity of 
f we then have 

f(X,Y) = ftXx^Sjyjej) = SijxiyjfCe^ej) . 

If we define a ;j = f(ej, ej), then we see that our expression becomes 

f(X, Y) = Zi. j x'a.yi = X T AY . 

To prove the uniqueness of the matrix A, suppose there exists a matrix A' 
such that f = f a'. Then for all XGf m andYe f we have 

f(X, Y) = X T AY = X T A'Y 

and hence X T (A - A')Y = 0. Now let C = A - A' so that 

X T CY = Si.jCijx'yj = 

for all X E J :m and Y E ^F n . In particular, choosing X = e s and Y = e j5 we find 
that Cij = for every i and j. Thus C = so that A = A'. I 

The matrix A defined in this theorem is said to represent the bilinear map 
f relative to the standard bases for !f m and ^F n . It thus appears that f is repre- 
sented by the mn elements & i} = f(ej, ej). It is extremely important to realize 
that the elements are defined by the expression f(ej, ej) and, conversely, 
given a matrix A = (a^), we define the expression f(ej, ej) by requiring that 
f(ej, = a ;j . In other words, to say that we are given a bilinear map f: ^ m x 

^F n — > J means that we are given values of f(ej, ej) for each i and j. Then, 
given these values, we can evaluate expressions of the form f(X, Y) = 
x'y J f(ej, e,-). Conversely, if we are given each of the f(ej, ej), then we have 
defined a bilinear map on ^ m x ^F n . 



464 



LINEAR FORMS 



We denote the set of all bilinear maps on U and V by ®(U x V, f), and the 
set of all bilinear forms as simply ®(V) = ®(V x V, ( p). It is easy to make 
®(U x V, f) into a vector space over J. To do so, we simply define 

(af + bg)(u, v) = af(u, v) + bg(u, v) 

for any f , g G ®(U xV,^) and a, b G J. The reader should have no trouble 
showing that af + bg is itself a bilinear mapping. 

It is left to the reader (see Exercise 9.4.1) to show that the association 
A i-* f a defined in Theorem 9.9 is actually an isomorphism between 

M mxn C70 and ®(J m x f). More generally, it should be clear that Theo- 
rem 9.9 applies equally well to any pair of finite-dimensional vector spaces U 
and V, and from now on we shall treat it as such. 

Theorem 9.10 Let V be finite-dimensional over J, and let V* have basis 
{co 1 }. Define the elements f ij G «8(V) by 

fj(u, v) = coXuMv) 

for all u, v G V. Then {f 1J } forms a basis for ®(V) which thus has dimension 
(dim V) 2 . 

Proof Let {ej be the basis for V dual to the {co 1 } basis for V*, and define 
ajj = f(e s , e,). Given any f G ®(V), we claim that f = 2ij a ;j f 1J . To prove this, it 

suffices to show that f(e r , e s ) = (2; > j aijf 1J )(e r , e s ) for all r and s. We first note 
that 

(Zij a ijf J )( e r> e s) = 2 itj a i jrf(.e r ya> J (.e s ) = ^ Uj a ij d l r 8 J s = a rs 
~f(e r , e s ) . 

Since f is bilinear, it follows from this that f(u, v) = (2i, j a^f 1J )(u, v) for all u, 
v G V so that f = Zi, j a,jf ij . Hence {f ij } spans ®(V). 

Now suppose that 2i > j & i} f 1} = (note that this is actually an element of 
®(V)). Applying this to (e r , e s ) and using the above result, we see that 

= (2i,ja ij f 1J )(e r ,e s ) = a rs . 

Therefore {f 1J } is linearly independent and hence forms a basis for ®(V). I 
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It should be mentioned in passing that the functions f 1J defined in Theorem 
9.10 can be written as the tensor product to 1 ® ra 1 : V x V -> f (see Example 

9.7). Thus the set of bilinear forms co 1 ® o* 1 forms a basis for the space V* ® 
V* which is called the tensor product of the two spaces V*. This remark is 
not meant to be a complete treatment by any means, and we will return to 
these ideas in Chapter 11. 

We also note that if {ej is a basis for V and dim V = n, then the matrix A 
of any f G ®(V) has elements a ;j = f(e s , e,), and hence A = (a^) has n 2 inde- 
pendent elements. Thus, dim ®(V) = n 2 as we saw above. 

Theorem 9.11 Let P be the transition matrix from a basis {ej for V to a new 

basis {e'j}. If A is the matrix of f E ®(V) relative to {ej}, then A' = P T AP is 
the matrix of f relative to the basis {e'j}. 

Proof Let X, Y G V be arbitrary. In Section 5.4 we showed that the transition 
matrix P = (pjj) defined by e'j = P(ej) = ZjCjPjj also transforms the components 
of X = SjX^i = 2jX ,J e'j as x 1 = 2jPijX /J . In matrix notation, this may be written 

as [X] e = P[X] e ' (see Theorem 5.17), and hence [X] e T = [X] e ' T P T . From 
Theorem 9.9 we then have 

f(X,Y) = [X] e T A[Y] e = [X] e < T [P] T A[P][Y] e - = [X] e ' T A'[Y] e - . 

Since X and Y are arbitrary, this shows that A' = P T AP is the unique repre- 
sentation of f in the new basis {e'j}. I 

Just as the transition matrix led to the definition of a similarity transforma- 
tion, we now say that a matrix B is congruent to a matrix A if there exists a 

nonsingular matrix P such that B = P T AP. It was shown in Exercise 5.2.12 
that if P is nonsingular, then r(AP) = r(PA) = r(A). Since P is nonsingular, r(P) 

= r(P T ), and hence r(B) = r(P T AP) = r(AP) = r(A). In other words, congruent 
matrices have the same rank. We are therefore justified in defining the rank 
r(f ) of a bilinear form f on V to be the rank of any matrix representation of f. 
We leave it to the reader to show that f is nondegenerate if and only if r(f ) = 
dim V (see Exercise 9.4.3). 

Exercises 

1. Show that the association A >-» f a defined in Theorem 9.9 is an isomor- 
phism between M m x m (f) and $CF m xf n ,f). 
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2. Let V = M mxn (^) and suppose A G M m (J~) is fixed. Then for any X, Y G 

V we define the mapping f A :VxV J by f A (X, Y) = Tr(X T AY). Show 
that this defines a bilinear form on V. 

3. Prove that a bilinear form f on V is nondegenerate if and only if r(f ) = 
dim V. 

4. (a) Let V = R 3 and define f G S(V) by 

f(X, Y) = 3xV - 2xV 2 + 5x 2 y J + 7x 2 y 2 - 8x 2 y 3 + 4x 3 y 2 - x 3 y 3 . 
Write out f(X, Y) as a matrix product X T AY. 

(b) Suppose A G M n (7) and let f(X, Y) = X T AY for X, Y G f n . Show 
thatf G®CF n ). 

5. Let V = R 2 and define f G ®(V) by 

f(X, Y) = 2xV - 3x J y 2 + x 2 y 2 . 

(a) Find the matrix representation A of f relative to the basis vi = (1, 0), 

V 2 = (l, 1). 

(b) Find the matrix representation B of f relative to the basis \\ = (2, 1), 
v 2 = (l,-l). 

(c) Find the transition matrix P from the basis {v ; } to the basis {v ; } and 
verify that B = P T AP. 

6. Let V = M n (C), and for all A, B G V define 

f(A, B) = n Tr(AB) - (Tr A)(Tr B) . 

(a) Show that this defines a bilinear form on V. 

(b) Let U C V be the subspace of traceless matrices. Show that f is degen- 
erate, but that f u = f|U is nondegenerate. 

(c) Let W C V be the subspace of all traceless skew-Hermitian matrices A 
(i.e., Tr A = and At = A* T = -A). Show that f w = f|W is negative defi- 
nite, i.e., that f w(A, A) < for all nonzero A G W. 

(d) Let V C V be the set of all matrices A G V with the property that 
f(A, B) = for all B G V. Show that V is a subspace of V. Give an explicit 
description of V and find its dimension. 
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9.5 SYMMETRIC AND ANTISYMMETRIC BILINEAR FORMS 

An extremely important type of bilinear form is one for which f(u, u) = for 
all u G V. Such forms are said to be alternating. If f is alternating, then for 
every u, v G V we have 



A bilinear form that satisfies this condition is called antisymmetric (or skew- 
symmetric). If we let v = u, then this becomes f(u, u) + f(u, u) = 0. As long as 
J is not of characteristic 2 (see the discussion following Theorem 4.3; this is 
equivalent to the statement that 1 + 1 ^ in f), we can conclude that f(u, u) = 
0. Thus, as long as the base field f is not of characteristic 2, alternating and 
antisymmetric forms are equivalent. We will always assume that 1 + 1 ^ in 
J unless otherwise noted, and hence we always assume the equivalence of 
alternating and antisymmetric forms. 

It is also worth pointing out the simple fact that the diagonal matrix ele- 
ments of any representation of an alternating (or antisymmetric) bilinear form 
will necessarily be zero. This is because the diagonal elements are given by 
a ii = f(e i ,e i )=0. 

Theorem 9.12 Let f G ®(V) be alternating. Then there exists a basis for V in 
which the matrix A of f takes the block diagonal form 



Moreover, the number of blocks consisting of the matrix M is just (l/2)r(f). 



= f(u + v, u + v) 
= f(u, u) + f(u, v) + f(v, u) + f(v, v) 
= f(u, v) + /(v, u) 



and hence 



f(u, v) = -f(v, u) . 



A = 



M©- • -©M©0©- • -©0 



where is the 1 x 1 matrix (0), and 




Proof We first note that the theorem is clearly true if f = 0. Next we note that 
if dim V = 1, then any vector v ; G V is of the form v ; = a ; u for some basis 
vector u and scalar a s . Therefore, for any v,, v 2 G V we have 
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f(Vi, v 2 ) = f(a,u, a 2 u) = a,a 2 f(u, u) = 

so that again f = 0. We now assume that f ^ and that dim V > 1, and proceed 
by induction on dim V. In other words, we assume the theorem is true for 
dim V < n, and proceed to show that it is also true for dim V = n. 

Since dim V > 1 and f ^ 0, there exist nonzero vectors u b u 2 G V such that 
f(u,, u 2 ) ^ 0. Moreover, we can always multiply U[ by the appropriate scalar so 
that 

f(u b u 2 ) = 1 = -f(u 2 , u,) . 

It is also true that U[ and u 2 must be linearly independent because if u 2 = ku l5 
then f(uj, u 2 ) = f(u,, ku^ = kf(u,, uj = 0. We can now define the two- 
dimensional subspace U C V spanned by the vectors {u b u 2 }. By definition, 
the matrix (ay) G M 2 (9 r ) of f restricted to U is given by = f(u ; , Uj), and 
hence it is easy to see that (a^) is given by the matrix M defined in the 
statement of the theorem. 

Since any u G U is of the form u = aU[ + bu 2 , we see that 

f(u, u,) = af(u b u,) + bf(u 2 , uO = -b 

and 

f(u, u 2 ) = af(u b u 2 ) + bf(u 2 , u 2 ) = a . 

Now define the set 

W = {w G V: f(w, u) = for every u G U} . 

We claim that V = U © W (compare this with Theorem 2.22). To show that 
UflW= {0}, we assume that vGUflW. Then v G U has the form v = aU[ + 
(3u 2 for some scalars a and (3. But v G W so that = f(v, uO = -|3 and = 
f(v, u 2 ) = a, and hence v = 0. 

We now show that V = U + W. Let v G V be arbitrary, and define the 
vectors 

u = f(v, u 2 )u l - f(v, u x )u 2 G U 
w = v-u . 

If we can show that w G W, then we will have shown that v = u + wGU + W 
as desired. But this is easy to do since we have 

f(u, Ml ) = /(v, u 2 )f(u 1 , Mj) -f(v, u x )f(u 2 , u x ) =/(v, Mj) 
f(u, u 2 ) = f(v, u 2 )f(u 1 ,u 2 )- f(v, u x )f{u 2 , u 2 ) = f(v, u 2 ) 

and therefore we find that 
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f(w, u l ) = f(v-u, Kj) =/(v, u x )-f{u, w, ) = 
/(w, u 2 ) = f(v-u, u 2 ) = f(v, u 2 )-f(u, w 2 ) = . 

These equations show that f(w, u) = for every u G U, and thus w G W. This 
completes the proof that V = U © W, and hence it follows that dim W = 
dim V - dim U = n - 2 < n. 

Next we note that the restriction of f to W is just an alternating bilinear 
form on W, and therefore, by our induction hypothesis, there exists a basis 
{u3, . . . , u n } for W such that the matrix of f restricted to W has the desired 
form. But the matrix of V is the direct sum of the matrices of U and W, where 
the matrix of U was shown above to be M. Therefore {u,, u 2 , . . . , u n } is a 
basis for V in which the matrix of f has the desired form. 

Finally, it should be clear that the rows of the matrix of f that are made up 
of the portion M © • • • © M are necessarily linearly independent (by defini- 
tion of direct sum and the fact that the rows of M are independent). Since each 
M contains two rows, we see that r(f) = rr(f) is precisely twice the number of 
M matrices in the direct sum. I 

Corollary 1 Any nonzero alternating bilinear form must have even rank. 

Proof Since the number of M blocks in the matrix of f is (l/2)r(f), it follows 
that r(f) must be an even number. I 

Corollary 2 If there exists a nondegenerate, alternating form on V, then 
dim V is even. 

Proof This is Exercise 9.5.7. I 

If f G ®(V) is alternating, then the matrix elements representing f rela- 
tive to any basis {ej for V are given by 

aij = f(ej, e,-) = -f(e,-, eO = -a^ . 

Any matrix A = (a ;j ) G M n (Jf) with the property that & i} = — a,j (i.e., A = -A T ) 
is said to be antisymmetric. If we are given any element & i} of an anti- 
symmetric matrix, then we automatically know a^. Because of this, we say 
that a ;j and a j; are not independent. Since the diagonal elements of any such 
antisymmetric matrix must be zero, this means that the maximum number of 

independent elements in A is given by (n 2 - n)/2. Therefore, the subspace of 
®(V) consisting of nondegenerate alternating bilinear forms is of dimension 
n(n - l)/2. 
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Another extremely important class of bilinear forms on V is that for which 
f(u, v) = f(v, u) for all u, v G V. In this case we say that f is symmetric, and 
we have the matrix representation 

ajj = f(e ; , e,-) = f(e,-, e ; ) = a }i . 

As expected, any matrix A = (a ;j ) with the property that a ;j = a,i (i.e., A = A T ) 
is said to be symmetric. In this case, the number of independent elements of 
A is [(n 2 - n)/2] + n = (n 2 + n)/2, and hence the subspace of ®(V) consisting 
of symmetric bilinear forms has dimension n(n + l)/2. 

It is also easy to prove generally that a matrix A G M n (y) represents a 
symmetric bilinear form on V if and only if A is a symmetric matrix. Indeed, 
if f is a symmetric bilinear form, then for all X, Y G V we have 

X T AY = f(X, Y) = f(Y, X) = Y T AX . 

But X T AY is just a 1 x 1 matrix, and hence (X T AY) T = X T AY. Therefore 
(using Theorem 3.18) we have 

Y T AX = X T AY = (X T AY) T = Y T A T X T T = Y T A T X . 

Since X and Y are arbitrary, this implies that A = A T . Conversely, suppose 
that A is a symmetric matrix. Then for all X, Y G V we have 

X T AY = (X T AY) T = Y T A T X TT = Y T AX 

so that A represents a symmetric bilinear form. The analogous result holds for 
antisymmetric bilinear forms as well (see Exercise 9.5.2). 

Note that adding the dimensions of the symmetric and antisymmetric sub- 
spaces of ®(V) we find 

n(n - l)/2 + n(n + l)/2 = n 2 = dim ®(V) . 

This should not be surprising since, for an arbitrary bilinear form f G ®(V) 
and any X, Y G V, we can always write 

f(X, Y) = (l/2)[f(X, Y) + f(Y, X)] + (l/2)[f(X, Y) - f(Y, X)] . 

In other words, any bilinear form can always be written as the sum of a sym- 
metric and an antisymmetric bilinear form. 
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There is another particular type of form that is worth distinguishing. In 
particular, let V be finite-dimensional over J, and let f = ( , ) be a symmetric 
bilinear form on V. We define the mapping q: V -*■ 5 by 

q(X) = f(X, X) = (X, X) 

for every X G V. The mapping q is called the quadratic form associated 
with the symmetric bilinear form f. It is clear that (by definition) q is 
represented by a symmetric matrix A, and hence it may be written in the 
alternative forms 

q(X) = X T AX = ZijayxV = S.a^x 1 ) 2 + 22i< j^xV . 

This expression for q in terms of the variables x 1 is called the quadratic poly- 
nomial corresponding to the symmetric matrix A. In the case where A hap- 
pens to be a diagonal matrix, then = for i ^ j and we are left with the 

simple form q(X) = a,,(x 1 ) 2 + • • • + a nn (x n ) 2 . In other words, the quadratic 
polynomial corresponding to a diagonal matrix contains no "cross product" 
terms. 

While we will show below that every quadratic form has a diagonal repre- 
sentation, let us first look at a special case. 

Example 9.8 Consider the real quadratic polynomial on defined by 

q(Y) = 2 i ,jb ij y i y j 

(where b^ = as usual for a quadratic form). If it happens that b,, = but, for 
example, that b 12 ^ 0, then we make the substitutions 



y = x l for i = 3, . . . , n . 
A little algebra (which you should check) then shows that q(Y) takes the form 

q(Y) = Si.jCijxV 

where now c u ^0. This means that we can focus our attention on the case 
q(X) = 2i, jaijx'x- 1 where it is assumed that a n ^ 0. 
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Thus, given the real quadratic form q(X) = Sija^x- 1 where an ^ 0, let us 
make the substitutions 

x 1 = y 1 - (l/«i i )[a l2 y 2 + • ■ ■ + a ln y n ] 
x l = y for each i = 2, ... , n . 

Some more algebra shows that q(X) now takes the form 

q(x\ . . . , x n ) = anCy 1 ) 2 + q'(y 2 , • • • , y n ) 

where q' is a new quadratic polynomial. Continuing this process, we eventu- 
ally arrive at a new set of variables in which q has a diagonal representation. 
This is called completing the square. / 

Given any quadratic form q, it is possible to fully recover the values of f 
from those of q. To show this, let u, v G V be arbitrary. Then 

q(u + v) = (u + v, u + 

= (u, + (u, + (v, + (v, 
= q{u) + 2/(h, v) + q(v) 

and therefore 

f(u, v) = (l/2)[g(w + v) - q(u) - q(v)] . 
This equation is called the polar form of f. 

Theorem 9.13 Let f be a symmetric bilinear form on a finite-dimensional 
space V. Then there exists a basis {ej for V in which f is represented by a 
diagonal matrix. Alternatively, if f is represented by a (symmetric) matrix A in 
one basis, then there exists a nonsingular transition matrix P to the basis {ej 
such that P T AP is diagonal. 

Proof Since the theorem clearly holds if either f = or dim V = 1, we assume 
that f ^ and dim V = n > 1, and proceed by induction on dim V. If q(u) = 
f(u, u) = for all u£V, then the polar form of f shows that f = 0, a contradic- 
tion. Therefore, there must exist a vector v, G V such that f(v l5 v,) ^ 0. Now 
let U be the (one-dimensional) subspace of V spanned by v l5 and define the 
subspace W = {u£V: f(u, v,) = 0}. We claim that V = U © W. 

Suppose v £ U fl W. Then v G U implies that v = kV[ for some scalar k, 
and hence v G W implies = f(v, V[) = k f(v b v^. But since f(v b V[) ^ we 
must have k = 0, and thus v = kv! = 0. This shows that U D W = {0}. 
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Now let v G V be arbitrary, and define 

W = V - [f(v, Vi)/f(Vi, v,)]vi . 

Then 

f(w, Vl ) = f(v, Vl ) - [f(v, vO/fCVi, Vl )]f(v b Vl ) = 

and hence w G W. Since the definition of w shows that any v G V is the sum 
of w G W and an element of U, we have shown that V = U + W, and hence V 
= U©W. 

We now consider the restriction of f to W, which is just a symmetric 
bilinear form on W. Since dim W = dim V - dim U = n - 1, our induction 
hypothesis shows there exists a basis {e 2 , . . . , e n } for W such that f(ej, e,-) = 
for all i ^ j where i, j = 2, . . . , n. But the definition of W shows that f(ej, v x ) = 
for each i = 2, . . . , n, and thus if we define e[ = v 1; the basis {e^ . . . , e n } for 
V has the property that f(ej, ej) = for all i ^ j where now i , j = 1, . . . , n. This 
shows that the matrix of f in the basis {ej} is diagonal. The alternate statement 
in the theorem follows from Theorem 9. 1 1. I 

In the next section, we shall show explicitly how this diagonalization can 
be carried out. 



Exercises 



1. (a) Show that if f is a nondegenerate, antisymmetric bilinear form on V, 
then n = dim V is even. 

(b) Show that there exists a basis for V in which the matrix of f takes the 
block matrix form 

/ 







where D is the (n/2) x (n/2) matrix 



(0 




1\ 

1 





2. Show that a matrix A G M n (^T) represents an antisymmetric bilinear form 
on V if and only if A is antisymmetric. 
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3. Reduce each of the following quadratic forms to diagonal form: 

(a) q(x, y, z) = 2x 2 - 8xy + y 2 - 16xz + 14yz + 5z 2 . 

(b) q(x, y, z) = x 2 - xz + y 2 . 

(c) q(x, y, z) = xy + y 2 + 4xz + z 2 . 

(d) q(x, y, z) = xy + yz. 

4. (a) Find all antisymmetric bilinear forms on R 3 . 

(b) Find a basis for the space of all antisymmetric bilinear forms on R n . 

5. Let V be finite-dimensional over C. Prove: 

(a) The equation 

(Ef)(u,v) = (l/2)[f(u,v)-f(v,u)] 

for every f G ®(V) defines a linear operator E on ®(V). 

(b) E is a projection, i.e., E 2 = E. 

(c) If T G L(V), the equation 

(Ttf)(u, v) = f(Tu, Tv) 

defines a linear operator on ®(V). 

(d) E Tt = Tt E for all T E ®(V). 

6. Let V be finite-dimensional over C, and suppose f,gG ®(V) are antisym- 
metric. Show there exists an invertible T G L(V) such that f(Tu, Tv) = 
g(u, v) for all u, v G V if and only if f and g have the same rank. 

7. Prove Corollary 2 of Theorem 9. 12. 

9.6 DIAGONALIZATION OF SYMMETRIC BILINEAR FORMS 

Now that we know any symmetric bilinear form f can be diagonalized, let us 
look at how this can actually be carried out. After this discussion, we will give 
an example that should clarify everything. (The algorithm that we are about to 
describe may be taken as an independent proof of Theorem 9.13.) Let the 
(symmetric) matrix representation of f be A = (a^) G M n (!F), and first assume 
that a n ^0. For each i = 2, . . . , n we multiply the ith row of A by a u , and then 
add -a n times the first row to this new ith row. In other words, this 
combination of two elementary row operations results in A; -* a n Aj - ^A^ 
Following this procedure for each i = 2, . . . , n yields the first column of A in 
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the form A 1 = (a u , 0, . . . , 0) (remember that this is a column vector, not a row 
vector). We now want to put the first row of A into the same form. However, 
this is easy because A is symmetric. We thus perform exactly the same opera- 
tions (in the same sequence), but on columns instead of rows, resulting in 
A 1 — * anA 1 - anA 1 . Therefore the first row is also transformed into the form 
A[ = (a u , 0, . . . , 0). In other words, this sequence of operations results in the 
transformed A having the block matrix form 




where B is a matrix of size less than that of A. We can also write this in the 
form (a u ) © B. 

Now look carefully at what we did for the case of i = 2. Let us denote the 
multiplication operation by the elementary matrix E m , and the addition opera- 
tion by E a (see Section 3.8). Then what was done in performing the row oper- 
ations was simply to carry out the multiplication (E a E m )A. Next, because A is 
symmetric, we carried out exactly the same operations but applied to the 
columns instead of the rows. As we saw at the end of Section 3.8, this is 
equivalent to the multiplication A(E m T E a T ). In other words, for i = 2 we 
effectively carried out the multiplication 

EaEmAE m E a 

For each succeeding value of i we then carried out this same procedure, and 
the final net effect on A was simply a multiplication of the form 



EjAEi • • • E 



T 



which resulted in the block matrix (a n ) © B shown above. Furthermore, note 

that if we let S = E, T • • • E S T = (E s • • • E0 T , then (a„) © B = S T AS must be 

symmetric since (S T AS) T = S T A T S = S T AS. This means that in fact the 
matrix B must also be symmetric. 

We can now repeat this procedure on the matrix B and, by induction, we 
eventually arrive at a diagonal representation of A given by 

D = E r - • •E 1 AE 1 T - • -E r T 



for some set of elementary row transformations E ; . But from Theorems 9.11 
and 9.13, we know that D = P T AP, and therefore P T is given by the product 
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e r • • • e^I) = E r ■ ■ ■ Ej of elementary row operations applied to the identity 
matrix exactly as they were applied to A. It should be emphasized that we 
were able to arrive at this conclusion only because A is symmetric, thereby 
allowing each column operation to be the transpose of the corresponding row 
operation. Note however, that while the order of the row and column opera- 
tions performed is important within their own group, the associativity of the 
matrix product allows the column operations (as a group) to be performed 
independently of the row operations. 

We still must take into account the case where a n = 0. If a n = but a u ^ 
for some i > 1, then we can bring a^ into the first diagonal position by inter- 
changing the ith row and column with the first row and column respectively. 
We then follow the procedure given above. If a ;i = for every i = 1, . . . , n, 
then we can pick any a ;j ^ and apply the operations A s -* A; + A, and A 1 -* 
A 1 + A J . This puts 2a ;j ^ into the ith diagonal position, and allows us to pro- 
ceed as in the previous case (which then goes into the first case treated). (Note 
also that this last procedure requires that our field is not of characteristic 2 
because we assumed that & i} + a ;j = 2& i} ^ 0.) 



Example 9.9 Let us find the transition matrix P such that D = P T AP is diag- 
onal, with A given by 

( 1 -3 2\ 



-3 7 -5 
2-5 8 



We begin by forming the matrix (A|I): 



/ 1 -3 2 
-3 7-5 
2-5 8 



1 0^ 
1 
1, 



Now carry out the following sequence of elementary row operations to both A 
and I, and identical column operations to A only: 



A 2 + 3A 3 
A 3 - 2A X 







-3 2 
-2 1 
1 4 



| 1 0^ 
| 3 1 
j-2 1, 
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2A 3 + A 2 



1 


i 


1 





0^ 


U -2 


1 

1 


3 


1 


(J 


1 


4 


-2 





1, 


t 

2 + 3A 1 


t 

A 3 - 


2A 1 




(1 





1 








-2 


1 


3 


1 








9 


-1 


1 


2 


1 





1 








-2 





3 


1 








18 


-1 


1 


2 


t 

2A 3 + A 2 









We have thus diagonalized A, and the final form of the matrix (A|I) is just 
(D|P T ). / 

Since Theorem 9.13 tells us that every symmetric bilinear form has a diag- 
onal representation, it follows that the associated quadratic form q(X) has the 
diagonal representation 

q(X) = X T AX = a^x 1 ) 2 + • • • + a nn (x n ) 2 

where A is the diagonal matrix representing the (symmetric) bilinear form. 

Let us now specialize this discussion somewhat and consider only real 
symmetric bilinear forms. We begin by noting that in general, the diagonal 
representation of a symmetric bilinear form f has positive, negative, and zero 
entries. We can always renumber the basis vectors so that the positive entries 
appear first, followed by the negative entries and then the zero entries. It is in 
fact true, as we now show, that any other diagonal representation of f has the 
same number of positive and negative entries. If there are P positive entries 
and N negative entries, then the difference S = P - N is called the signature 
off. 

Theorem 9.14 Let f G ®(V) be a real symmetric bilinear form. Then every 
diagonal representation of f has the same number of positive and negative 
entries. 
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Proof Let {e b . . . , e n } be the basis for V in which the matrix of f is diagonal 
(see Theorem 9.13). By suitably numbering the e ; , we may assume that the 
first P entries are positive and the next N entries are negative (also note that 
there could be n - P - N zero entries). Now let {e\, . . . , e' n } be another basis 
for V in which the matrix of f is also diagonal. Again, assume that the first P' 
entries are positive and the next N' entries are negative. Since the rank of f is 
just the rank of any matrix representation of f, and since the rank of a matrix is 
just the dimension of its row (or column) space, it is clear that r(f) = P + N = 
P' + N'. Because of this, we need only show that P = P'. 

Let U be the linear span of the P vectors {e b . . . , ep}, let W be the linear 
span of {e'p'+i, • • • , e' n }, and note that dim U = P and dim W = n - P'. Then 
for all nonzero vectors u G U and w G W, we have f(u, u) > and f(w, w) < 
(this inequality is < and not < because if P' + N' ^ n, then the last of the basis 
vectors that span W will define a diagonal element in the matrix of f that is 0). 
Hence it follows that U D W = {0}, and therefore (by Theorem 2. 1 1) 

dim(t/ + W) = dimt/ + dim W - dim(U nW) = P + n-P'-0 
= P-P' + n . 

Since U and W are subspaces of V, it follows that dim(U + W) < dim V = n, 
and therefore P - P' + n < n. This shows that P < P'. Had we let U be the span 
of {e'i, . . . , e'p'} and W be the span of {ep+i , . . . , e n }, we would have found 
that P' < P. Therefore P = P' as claimed. I 

While Theorem 9.13 showed that any quadratic form has a diagonal repre- 
sentation, the important special case of a real quadratic form allows an even 
simpler representation. This corollary is known as Sylvester's theorem or the 
law of inertia. 

Corollary Let f be a real symmetric bilinear form. Then f has a unique diag- 
onal representation of the form 

(I \ 

r 




where I r and I s are the r x r and s x s unit matrices, and t is the t x t zero 
matrix. In particular, the associated quadratic form q has a representation of 
the form 

q( Xl , . . . , x n ) = (x 1 ) 2 + • • • + (x r ) 2 - (x r+1 ) 2 (x r+s ) 2 . 
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Proof Let f be represented by a (real) symmetric n x n matrix A. By Theorem 
9.14, there exists a nonsingular matrix P[ such that D = P| T AP! = (dY,) is a 
diagonal representation of f with a unique number r of positive entries fol- 
lowed by a unique number s of negative entries. We let t = n - r - s be the 
unique number of zero entries in D. Now let P 2 be the diagonal matrix with 
diagonal entries 

l /yRu fori-1, ...,r 
i p 2)u=\ l l4-dii for/ = r + l, r + s . 

1 for i = r + s + \, n 

Since P 2 is diagonal, it is obvious that (P 2 ) T = P 2 . We leave it to the reader to 
multiply out the matrices and show that 

P 2 T DP 2 = P^PjAP^ = (V x V 2 ) T k(V? 2 ) 

is a congruence transformation that takes A into the desired form. I 

We say that a real symmetric bilinear form f E !B(V) is nonnegative (or 
positive semidefinite) if q(X) = X T AX = Zi, j ayxV = f(X, X) > for all X E 
V, and we say that f is positive definite if q(X) > for all nonzero X E V. In 
particular, from Theorem 9.14 we see that f is nonnegative semidefinite if and 
only if the signature S = r(f ) < dim V, and f will be positive definite if and 
only if S = dim V. 

Example 9.10 The quadratic form (x 1 ) 2 - 4x*x 2 + 5(x 2 ) 2 is positive definite 
because it can be written in the form 

(x 1 - 2x 2 ) 2 + (x 2 ) 2 

which is nonnegative for all real values of x 1 and x 2 , and is zero only if x 1 = 
x 2 = 0. 

The quadratic form (x 1 ) 2 + (x 2 ) 2 + 2(x 3 ) 2 - 2x l x 3 - 2x 2 x 3 can be written in 
the form 

(x 1 - x 3 ) 2 + (x 2 - x 3 ) 2 . 

Since this is nonnegative for all real values of x 1 , x 2 and x 3 but is zero for 
nonzero values (e.g., x 1 = x 2 = x 3 ^ 0), this quadratic form is nonnegative but 
not positive definite. / 
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Exercises 

1. Determine the rank and signature of the following real quadratic forms: 

(a) x 2 + 2xy + y 2 . 

(b) x 2 + xy + 2xz + 2y 2 + 4yz + 2z 2 . 

2. Find the transition matrix P such that P T AP is diagonal where A is given 
by: 





( 1 


2 


-3^ 




f° 1 *) 


(a) 


2 


5 


-4 






1 -2 2 






-4 






,1 2 ~\ 


1 




1 


-2 


-3^ 






1 


2 


-5 


-1 




(c) 












-2 


-5 


6 


9 






-3 


-1 


9 


11/ 





3. Let f be the symmetric bilinear form associated with the real quadratic 
form q(x, y) = ax + bxy + cy . Show that: 

(a) f is nondegenerate if and only if b - 4ac ^ 0. 

(b) f is positive definite if and only if a > and b 2 - 4ac < 0. 

4. If A is a real, symmetric, positive definite matrix, show there exists a non- 
singular matrix P such that A = P T P. 

The remaining exercises are all related. 

5. Let V be finite-dimensional over C, let S be the subspace of all symmetric 
bilinear forms on V, and let Q be the set of all quadratic forms on V. 

(a) Show that Q is a subspace of all functions from V to C. 

(b) Suppose T G L(V) and q£Q. Show that the equation (Ttq)(v) = 
q(Tv) defines a quadratic form T^q on V. 

(c) Show that the function is a linear operator on Q, and show that 
is invertible if and only if T is invertible. 

6. (a) Let q be the quadratic form on R 2 defined by q(x, y) = ax 2 + 2bxy + 
cy 2 (where a * 0). Find an invertible T G L(R 2 ) such that 



(Ttq)(x, y) = ax 2 + (c - b 2 /a)y 2 
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[Hint: Complete the square to find T 1 (and hence T).] 

(b) Let q be the quadratic form on R 2 defined by q(x, y) = 2bxy. Find an 
invertible T E L(R 2 ) such that 

(Ttq)(x,y) = 2bx 2 -2by 2 . 

(c) Let q be the quadratic form on R 3 defined by q(x, y, z) = xy + 2xz + 
z 2 . Find an invertible T G L(R 3 ) such that 

(Ttq)(x, y, z) = x 2 - y 2 + z 2 . 

7. Suppose A G M n (R) is symmetric, and define a quadratic form q on R n by 

n 

q(X) = ^ a ij x ' xJ ■ 

Show there exists T G L(R n ) such that 

(rto = J C; (x,) 2 

where each Cj is either or +1. 
9.7 HERMITIAN FORMS 

Let us now briefly consider how some of the results of the previous sections 
carry over to the case of bilinear forms over the complex number field. Much 
of this material will be elaborated on in the next chapter. 

We say that a mapping f: V x V -» C is a Hermitian form on V if for all 
u,, u 2 , v G V and a, b G C we have 

(1) f(auj + bu 2 , v) = a*f(u b v) + b*f(u 2 , v). 

(2) f(u 1 ,v) = f(v,u 1 )*. 

(We should point out that many authors define a Hermitian form by requiring 
that the scalars a and b on the right hand side of property (1) not be the com- 
plex conjugates as we have defined it. In this case, the scalars on the right 
hand side of property (3) below will be the complex conjugates of what we 
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have shown.) As was the case for the Hermitian inner product (see Section 
2.4), we see that 

f(u, av 1 + bv 2 ) = f(av l + bv 2 ,u)* = [a*/(Vj , ") + b*f(v 2 , «)]* 
= af(v l , u)* + bf(v 2 , u)* = af{u, v x ) + bf(u, v 2 ) 

which we state as 

(3) f(u, av, + bv 2 ) = af(u, v,) + bf(u, v 2 ). 

Since f(u, u) = f(u, u)* it follows that f(u, u)GR for all u E V. 

Along with a Hermitian form f is the associated Hermitian quadratic 
form q: V -*■ R defined by q(u) = f(u, u) for all u G V. A little algebra 
(Exercise 9.7.1) shows that f may be obtained from q by the polar form 

expression of f which is 

f(u, v) = (l/4)[q(u + v) - q(u - v)] - (t/4)[q(u + rv) - q(u - rv)] . 

We also say that f is nonnegative semidefinite if q(u) = f(u, u) > for all u G 
V, and positive definite if q(u) = f(u, u) > for all nonzero u G V. For 

example, the usual Hermitian inner product on C n is a positive definite form 

since for every nonzero X = (x 1 , . . . , x n ) G C n we have 

n n 

q{X) = f(X, X) = (X, X) = 2 (x> )**'' = J Uf > . 

i-l i=l 

As we defined in Section 8.1, we say that a matrix H = (h ;j ) G M n (C) is 

Hermitian if h^ = h^*. In other words, H is Hermitian if H = H* T . We denote 
the operation of taking the transpose along with taking the complex conjugate 

of a matrix A by At (read "A dagger"). In other words, At = A* T . For reasons 
that will become clear in the next chapter, we frequently call At the 
(Hermitian) adjoint of A. Thus H is Hermitian if Ht = H. 

Note also that for any scalar k we have kt = k*. Furthermore, using 
Theorem 3.18(d), we see that 

(AB)t = (AB)* T = (A*B*) T = BtAt . 

By induction, this obviously extends to any finite product of matrices. It is 
also clear that 

Att = A . 
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Example 9.11 Let H be a Hermitian matrix. We show that f(X, Y) = X^HY 
defines a Hermitian form on C n . Let X l5 X 2 , Y G C" be arbitrary, and let a, 
b G C. Then (using Theorem 3.18(a)) 

f(aX 1 +bX 2 ,Y) = (aX l + bX 2 ) f HY 

= (a*X 1 t + b*X 2 f )HY 

= a*X^HY + b*X 2 f HY 
= a*f(X l ,Y) + b*f(X 2 ,Y) 

which shows that f(X, Y) satisfies property (1) of a Hermitian form. Now, 
since X^HY is a (complex) scalar we have (X't'HY) 1 = X^HY, and therefore 

f(X, Y)* = (XtHY)* = (XtHY)t = YtHX = f(Y, X) 

where we used the fact that W = H. Thus f(X, Y) satisfies property (2), and 

hence defines a Hermitian form on C n . 

It is probably worth pointing out that X't'HY will not be a Hermitian form 
if the alternative definition mentioned above is used. In this case, one must 
use f(X, Y) = X T HY* (see Exercise 9.7.2). / 

Now let V have basis {ej, and let f be a Hermitian form on V. Then for 
any X = 2 x'ej and Y = 2 y% in V, we see that 

f(X,Y) = ftXx^Zjyjej) = Zijx^yjffe,^.) . 

Just as we did in Theorem 9.9, we define the matrix elements h^ representing 
a Hermitian form f by h^ = f(e i5 ej). Note that since f(e ; , e,-) = f(ej, e ; )*, we see 
that the diagonal elements of H = (h^) must be real. Using this definition for 
the matrix elements of f, we then have 

f(X, Y) = Zi.jx^hyyj = XtHY . 

Following the proof of Theorem 9.9, this shows that any Hermitian form f has 
a unique representation in terms of the Hermitian matrix H. 

If we want to make explicit the basis referred to in this expression, we 
write f(X, Y) = [X] e tH[Y] e where it is understood that the elements h^ are 
defined with respect to the basis {ej}. Finally, let us prove the complex ana- 
logues of Theorems 9. 1 1 and 9. 14. 
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Theorem 9.15 Let f be a Hermitian form on V, and let P be the transition 
matrix from a basis {e^ for V to a new basis {e'j}. If H is the matrix of f with 
respect to the basis {ej} for V, then H' = P^HP is the matrix of f relative to the 
new basis {e'j}. 

Proof We saw in the proof of Theorem 9.11 that for any X G V we have 
[X] e = P[X] e ', and hence [X] e t = [X] e 'tpt. Therefore, for any XJEVwe 
see that 

f(X,Y) = [X] e tH[Y] e = [X]e'tptHP[Y] e ' = [X] e 'tH'[Y] e ' 

where H' = P^HP is the (unique) matrix of f relative to the basis {e'j}. I 

Theorem 9.16 Let f be a Hermitian form on V. Then there exists a basis for 
V in which the matrix of f is diagonal, and every other diagonal representation 
of f has the same number of positive and negative entries. 

Proof Using the fact that f(u, u) is real for all u G V along with the appropri- 
ate polar form of f, it should be easy for the reader to follow the proofs of 
Theorems 9.13 and 9.14 and complete the proof of this theorem (see Exercise 
9.7.3). I 

We note that because of this result, our earlier definition for the signature 
of a bilinear form applies equally well to Hermitian forms. 

Exercises 

1. Let f be a Hermitian form on V and q the associated quadratic form. 
Verify the polar form 

f(u, v) = (l/4)[q(u + v) - q(u - v)] - (//4)[q(u + rv) - q(u - iv)] . 

2. Verify the statement made at the end of Example 9.11. 

3. Prove Theorem 9. 1 6. 

4. Show that the algorithm described in Section 9.6 applies to Hermitian 
matrices if we allow multiplication by complex numbers and, instead of 
multiplying by E T on the right, we multiply by E* T . 



5. For each of the following Hermitian matrices H, use the results of the pre- 
vious exercise to find a nonsingular matrix P such that P T HP is diagonal: 
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(a) 



( 1 ^ 



0) 



' 1 2 + 3i N 
,2-3/ -1 , 



(c) 



/ 1 i 2 + A 
—i 2 1 - / 
2-i 1 + i 2 



/ 1 1 



+ ; 



2i \ 



1-i 4 2-3/ 
-2i 2 + 3/ 7 



9.8 SIMULTANEOUS DIAGONALIZATION * 

We now apply the results of Sections 8.1, 9.5 and 9.6 to the problem of simul- 
taneously diagonalizing two real quadratic forms. After the proof we shall 
give an example of how this result applies to classical mechanics. 

Theorem 9.17 Let X T AX and X T BX be two real quadratic forms on an n- 

dimensional Euclidean space V, and assume that X T AX is positive definite. 
Then there exists a nonsingular matrix P such that the transformation X = PY 
reduces X T AX to the form 



X T AX = Y T Y = (y 1 ) 2 + • • • + (y 11 ) 2 



and X T BX to the form 



X l BX = Y l DY = ^(y 1 ) 2 + • • • + X n (y n ) 



,n\2 



where X u . . . , X n are roots of the equation 

det(B - XA) = . 

Moreover, the A, are real and positive if and only if X T BX is positive definite. 

Proof Since A is symmetric, Theorem 9.13 tells us there exists a basis for V 
that diagonalizes A. Furthermore, the corollary to Theorem 9. 14 and the dis- 
cussion following it shows that the fact A is positive definite means that the 
corresponding nonsingular transition matrix R may be chosen so that the 
transformation X = RY yields 



X l AX = Y L Y = (y 1 ) 2 + • • • + (y n ) 



,n\2 



Note that Y T Y = X T AX = Y T R T ARY implies that 
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R T AR = I . 

We also emphasize that R will not be orthogonal in general. 

Now observe that R T BR is a real symmetric matrix since B is, and hence 
(by the corollary to Theorem 8.2) there exists an orthogonal matrix Q such 
that 

Q T R T BRQ = (RQ) T B(RQ) = diag^, . . . , X n ) = D 

where the X are the eigenvalues of R T BR. If we define the nonsingular (and 
not generally orthogonal) matrix P = RQ, then 

P T BP = D 

and 

P T AP = Q T R T ARQ = Q T IQ = I . 
Under the transformation X = PY, we are then left with 

X T AX = Y T P T APY = Y T Y 

as before, while 

X T BX = Y T P T BPY = Y T DY = X^y 1 ) 2 + • • • + X n (y n ) 2 
as desired. 

Now note that by definition, the X are roots of the equation 

det(R T BR - XI) = . 

Using R T AR = I this may be written as 

det[R T (B - XA)R] = . 

Since det R = det R T * 0, we find that (using Theorem 4.8) 

det(B - XA) = . 

Finally, since B is a real symmetric matrix, there exists an orthogonal 
matrix S that brings it into the form 

S T BS = diagOii, . . . , \i n ) = D 
where the are the eigenvalues of B. Writing X = SY, we see that 
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X T BX = Y T S T BSY = Y T DY = ^(y 1 ) 2 + • • • + \i n (y n ) 2 



and thus X T BX is positive definite if and only if Y T DY is positive definite, 
i.e., if and only if every \i { > 0. Since we saw above that 



it follows from Theorem 9.14 that the number of positive ^ must equal the 
number of positive X { . Therefore X T BX is positive definite if and only if every 



Example 9.12 Let us show how Theorem 9.17 can be of help in classical 
mechanics. This example requires a knowledge of both the Lagrange equa- 
tions of motion and Taylor series expansions. The details of the physics are 
given in, e.g., the classic text by Goldstein (1980). Our purpose is simply to 
demonstrate the usefulness of this theorem. 

Consider the small oscillations of a conservative system of N particles 
about a point of stable equilibrium. We assume that the position r ; of the ith 
particle is a function of n generalized coordinates q s , and not explicitly on the 
time t. Thus we write r s = rj(q b . . . , q n ), and 



where we denote the derivative with respect to time by a dot. 

Since the velocity v ; of the ith particle is given by r s , the kinetic energy T 
of the ith particle is (l/2)mi(Vi) 2 = (l/2)m i r i «r i , and hence the kinetic energy 
of the system of N particles is given by 



Thus the kinetic energy is a quadratic form in the generalized velocities qj. We 
also assume that the equilibrium position of each q s is at q s = 0. Let the poten- 
tial energy of the system be V = V(q 1; . . . , q n )- Expanding V in a Taylor 
series expansion about the equilibrium point, we have (using an obvious nota- 
tion for evaluating functions at equilibrium) 



P T BP = diag(X„ . . . , K) = D 



X,>0. I 





where 




V( qi ,...,q n ) = V(0) + 2 



n 



( 



o 
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At equilibrium, the force on any particle vanishes, and hence we must have 
(3V/3qj) = for every i. Furthermore, we may shift the zero of potential and 
assume that V(0) = because this has no effect on the force on each particle. 
We may therefore write the potential as the quadratic form 

n 

where the are constants, and b^ = b^. Returning to the kinetic energy, we 
expand about the equilibrium position to obtain 



M ij (q l ,...,q n ) = M ij (0) + ^ 

k-i\ 



V 



dq k ) 



4k + 



To a first approximation, we may keep only the first (constant) term in this 
expansion. Then denoting M^O) by & i} = we have 

n 

T = 2 a iMi ■ 

so that T is also a quadratic form. 

The Lagrange equations of motion are 



d_(dL)_ dL_ 



where L = T - V is called the Lagrangian. Since T is a function of the q s and 
V is a function of the q s , the equations of motion take the form 



d_ 
dt 



( dT \ 
dq t 



dV_ 
dq t 



(*) 



Now, the physical nature of the kinetic energy tells us that T must be a posi- 
tive definite quadratic form, and hence we seek to diagonalize T as follows. 

Define new coordinates q' l5 . . . , q' n by q ; = SjPijq'j where P = (p^) is a 
nonsingular constant matrix. Then differentiating with respect to time yields 
qi = SjPijq'j so that the q s are transformed in the same manner as the q ; . By 
Theorem 9.17, the transformation P may be chosen so that T and V take the 
forms 

T = (q' 1 )2 + ... + (q' n )2 

and 

V = X 1 (q' 1 ) 2 + -.. + X n (q' n ) 2 . 
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Since V = at q! = • • • = q n = 0, the fact that P is nonsingular tells us that V = 
at q'[ = • • • = q' n = as well. Thus we see that V is also positive definite, 
and hence each Xj > 0. This means that we may write Xj = oOj 2 where each (Oj is 
real and positive. 

Since P is a constant matrix, the equations of motion (*) are just as valid 
for T and V expressed in terms of q'i and q' ; . Therefore, substituting these 
expressions for T and V into (*), we obtain the equations of motion 



d 2 q\ 2 , 
~TY = -°>i * 

dr 



For each i = 1, . . . , n the solution to this equation is 



q'i = ocj cos(o)it + (30 



where a s and p s are constants to be determined from the initial conditions of 
the problem. 

The coordinates q'j are called the normal coordinates for the system of 
particles, and the form of the solution shows that the particles move according 
to simple harmonic motion. / 

For additional applications related to this example, we refer the reader to 
any advanced text on classical mechanics, such as those listed in the bibliog- 
raphy. (See, eg., Marion, chapter 13.6.) 



CHAPTER 10 



Linear Operators 



Recall that a linear transformation T G L(V) of a vector space into itself is 
called a (linear) operator. In this chapter we shall elaborate somewhat on the 
theory of operators. In so doing, we will define several important types of 
operators, and we will also prove some important diagonalization theorems. 
Much of this material is directly useful in physics and engineering as well as 
in mathematics. While some of this chapter overlaps with Chapter 8, we 
assume that the reader has studied at least Section 8.1. 



10.1 LINEAR FUNCTIONALS AND ADJOINTS 

Recall that in Theorem 9.3 we showed that for a finite-dimensional real inner 
product space V, the mapping u >-» L u = (u, ) was an isomorphism of V onto 
V*. This mapping had the property that L au v = (au, v) = a(u, v) = aL u v, and 
hence L au = aL u for all u G V and a G R. However, if V is a complex space 
with a Hermitian inner product, then L au v = (au, v) = a*(u, v) = a*L u v, and 
hence L au = a*L u which is not even linear (this was the definition of an anti- 
linear (or conjugate linear) transformation given in Section 9.2). Fortunately, 
there is a closely related result that holds even for complex vector spaces. 

Let V be finite-dimensional over C, and assume that V has an inner prod- 
uct (, ) defined on it (this is just a positive definite Hermitian form on V). 
Thus for any X, Y G V we have (X, Y) G C. For example, with respect to the 



AQf) 
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standard basis {e^ for C n (which is the same as the standard basis for R n ), we 
have X = Sx'ej and hence (see Example 2.13) 

(X, F) = (2 ( .A, V^) = ^/V'(^' 0> = 2 ,rVV^ 

= 2 ; .x'V =X* T Y . 

Note that we are temporarily writing X* T rather than Xt. We will shortly 
explain the reason for this (see Theorem 10.2 below). In particular, for any 
T G L(V) and X G V we have the vector TX G V, and hence it is meaningful 
to write expressions of the form (TX, Y) and (X, TY). 

Since we are dealing with finite-dimensional vector spaces, the Gram- 
Schmidt process (Theorem 2.21) guarantees that we can always work with an 
orthonormal basis. Hence, let us consider a complex inner product space V 
with basis {ej} such that (e s , ej) = Sjj. Then, just as we saw in the proof of 
Theorem 9. 1 , we now see that for any u = 2u J ej G V we have 

(e ; , u) = (e h 2jU j ej) = 2jU j (ei, = 2^5^ = u 1 

and thus 

u = 2i(e ; , u)ej . 

Now consider the vector Tej. Applying the result of the previous para- 
graph we have 

T^ = Sj^Te^ej . 

But this is precisely the definition of the matrix A = (a^) that represents T rel- 
ative to the basis {ej}. In other words, this extremely important result shows 
that the matrix elements of the operator T G L(V) are given by 

ajj = (ei,Tej) . 

It is important to note however, that this definition depended on the use of an 
orthonormal basis for V. To see the self-consistency of this definition, we go 
back to our original definition of (a^) as Tej = 2 k ekakj • Taking the scalar 
product of both sides of this equation with e s yields (using the orthonormality 
of the e s ) 

(ej, Tej } = (ej, 2kCk a kj } = 2k a kj(Gi? Ck) = 2k a kj&ik = a ij • 

We now prove the complex analogue of Theorem 9.3. 
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Theorem 10.1 Let V be a finite-dimensional inner product space over C. 
Then, given any linear functional L on V, there exists a unique u G V such 
that Lv = (u, v) for all v£V. 

Proof Let {ej be an orthonormal basis for V and define u = 2i(Lej)*ej . Now 
define the linear functional L u on V by L u v = (u, v) for every v G V. Then, in 
particular, we have 

L u ej = (u, e ; > = (2j(Lej)*ej, e ; > = 2Le(e. e,> = XLeo, = Lei • 

Since L and L u agree on a basis for V, they must agree on any v G V, and 
hence L = L u = (u, }. 

As to the uniqueness of the vector u, suppose u' G V has the property that 
Lv = (u', v) for every v G V. Then Lv = (u, v) = (u', v) so that (u - u', v) = 0. 
Since v was arbitrary we may choose v = u - u'. Then (u - u', u - u'} = 
which implies that (since the inner product is just a positive definite Hermitian 
form) u - u' = or u = u'. I 

The importance of finite-dimensionality in this theorem is shown by the 
following example. 

Example 10.1 Let V = R[x] be the (infinite-dimensional) space of all poly- 
nomials over R, and define an inner product on V by 



for every f, g G V. We will give an example of a linear functional L on V for 
which there does not exist a polynomial h G V with the property that Lf = 
(h, f }forallfGV. 

To show this, define the nonzero linear functional L by 



(L is nonzero since, e.g., L(a + x) = a.) Now suppose there exists a polynomial 
h G V such that Lf = f(0) = (h, f) for every f G V. Then, in particular, we have 



{f, g) = f f(x)g(x)dx 



Lf = f(0) . 



L(xf) = 0f(0) = = (h, xf) 



for every f G V. Choosing f = xh we see that 
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Since the integrand is strictly positive, this forces h to be the zero polynomial. 
Thus we are left with Lf = (h, f) = (0, f } = for every f G V, and hence L = 0. 
But this contradicts the fact that L ^ 0, and hence no such polynomial h can 
exist. 

Note the fact that V is infinite-dimensional is required when we choose f = 
xh. The reason for this is that if V consisted of all polynomials of degree < 
some positive integer N, then f = xh could have degree > N. / 

Now consider an operator T G L(V), and let u be an arbitrary element of 
V. Then the mapping L u : V — > C defined by L u v = (u, Tv) for every v G V is 
a linear functional on V. Applying Theorem 10.1, we see that there exists a 
unique u' G V such that (u, Tv) = L u v = (u', v) for every v G V. We now 
define the mapping Tt: V -*■ V by Ttu = u'. In other words, we define the 
adjoint Tt of an operator T G L(V) by 

( Ttu, v) = (u, Tv) 

for all u, v G V. The mapping Tt is unique because u' is unique for a given u. 
Thus, if Ttu = u' = Ttu, then (Tt - Tt)u = for every u G V, and hence Tt - 
Tt =0orTt =Tt. 
Note further that 

(Tu,v) = (v,Tu)* = (Ttv.u)* = (u,Ttv) . 

However, it follows from the definition that (u, Ttv) = (Tttu, v). Therefore 
the uniqueness of the adjoint implies that Ttt = T. 

Let us show that the map Tt is linear. For all u b u 2 , v G V and a, b G C we 
have 

[T\au x +bu 2 ), v) - {au x +bu 2 , Tv) 

= a*(uj, Tv) + b*{u 2 , Tv) 
= a*{T f u v v) + b*{T f u 2 , v) 
= {aT^u v v) + {bT^u 2 , v) 
= (aT^j +bT^u 2 , v) . 

Since this is true for every v G V, we must have 

Tt(aU] + bu 2 ) = aTtu[ + bTtu 2 . 



Thus Tt is linear and Tt G L(V). 
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If {ej} is an orthonormal basis for V, then the matrix elements of T are 
given by & i} = (e ; , Tej). Similarly, the matrix elements b i} of Tt are related to 
those of T because 

by = (ei.Ttej) = (Te ; , ej) = (e^Te;)* = aji* . 

In other words, if A is the matrix representation of T relative to the orthonor- 
mal basis {ej}, then A* T is the matrix representation of T^. This explains the 
symbol and terminology for the Hermitian adjoint used in the last chapter. 
Note that if V is a real vector space, then the matrix representation of is 
simply A T , and we may denote the corresponding operator by T T . 

We summarize this discussion in the following theorem, which is valid 
only in finite-dimensional vector spaces. (It is also worth pointing out that Tt 
depends on the particular inner product defined on V.) 

Theorem 10.2 Let T be a linear operator on a finite-dimensional complex 
inner product space V. Then there exists a unique linear operator on V 
defined by (T^u, v) = (u, Tv) for all u, v G V. Furthermore, if A is the matrix 

representation of T relative to an orthonormal basis {ej}, then A^ = A* T is the 
matrix representation of Tt relative to this same basis. If V is a real space, 

then the matrix representation of T+ is simply A T . 

Example 10.2 Let us give an example that shows the importance of finite- 
dimensionality in defining an adjoint operator. Consider the space V = R[x] of 
all polynomials over R, and let the inner product be as in Example 10.1. 
Define the differentiation operator D G L(V) by Df = df/dx. We show that 
there exists no adjoint operator Dt that satisfies (Df, g) = (f, D^g). 
Using (Df, g) = (f, D^g), we integrate by parts to obtain 

if, D*g) = (Df, g) = f Q (Df)8 dx = f [D(fg) - JDg] dx 
= (fg)(l)-(fg)(0)-{f,Dg) . 

Rearranging, this general result may be written as 

(f,(D + Dt)g) = (fg)(l) - (f g )(0) . 

We now let f = x 2 (l - x) 2 p for any p G V. Then f(l) = f(0) = so that we are 
left with 



10.1 LINEAR FUNCTIONALS AND ADJOINTS 



495 



= {f, (D + D f )g) = f l Q x 2 (1 - x) 2 p(D + D f )g dx 
= (x 2 (l-x) 2 (D + rf)g, p) . 

Since this is true for every p G V, it follows that x 2 (l - x) 2 (D + Dt)g = 0. But 

x 2 (l - x) 2 > except at the endpoints, and hence we must have (D + Dt)g = 
for all g G V, and thus D + Dt = 0. However, the above general result then 
yields 

= (f,(D + Dt)g> = (fg)(l) - (fg)(0) 

which is certainly not true for every f, g G V. Hence Dt must not exist. 

We leave it to the reader to find where the infinite-dimensionality of V = 
R[x] enters into this example. / 

While this example shows that not every operator on an infinite-dimen- 
sional space has an adjoint, there are in fact some operators on some infinite- 
dimensional spaces that do indeed have an adjoint. A particular example of 
this is given in Exercise 10.1.3. In fact, the famous Riesz representation theo- 
rem asserts that any continuous linear functional on a Hilbert space does 
indeed have an adjoint. While this fact should be well known to anyone who 
has studied quantum mechanics, we defer further discussion until Chapter 12 
(see Theorem 12.26). 

As defined previously, an operator T G L(V) is Hermitian (or self- 
adjoint) if Tt = T. The elementary properties of the adjoint operator Tt are 
given in the following theorem. Note that if V is a real vector space, then the 
properties of the matrix representing an adjoint operator simply reduce to 
those of the transpose. Hence, a real Hermitian operator is represented by a 
(real) symmetric matrix. 

Theorem 10.3 Suppose S, T G L(V) and c G C. Then 

(a) (S + T)t = St + Tt. 

(b) (cT)t = c *Tt. 

(c) (ST)t =TtSt. 

(d) Ttt = (Tt)t = T. 

(e) It = I and 0t = 0. 

(f) (Tt)-'=(T-')t. 

Proof Let u, v G V be arbitrary. Then, from the definitions, we have 

(a) {(S + T) f u, v) = (u, (S + T)v) = {u, Sv + Tv) = {u, Sv) + (u, Tv) 

= {S f u, v) + {T^u, v) = ((S 1 ' + r f )w, v). 

(b) {(cT)^u, v) = (u, cTv) = c(u, Tv) = c(T^u, v) = (c*T^u, v). 
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(c) ((Sr)Y v) = (u, (ST)v) = (u, S(Tv)) = (S u, Tv) 

= (T\S f u), v) = ((r t 5 + ) M , v). 

(d) This was shown in the discussion preceding Theorem 10.2. 

(e) (Iu, v) = (u, v) = (u, Iv) = (Itu, v). 

(Ou, v) = (0, v) = = (u, Ov) = (0t u , v). 

(f) I = It = (Tr')t = (T _1 )tTt so that (T~')t = (Tt)" 1 . 

The proof is completed by noting that the adjoint and inverse operators are 
unique. I 

Corollary If T G L(V) is nonsingular, then so is Tt. 

Proof This follows from Theorems 10.3(f ) and 5.10. I 

We now group together several other useful properties of operators for 
easy reference. 

Theorem 10.4 (a) Let V be an inner product space over either R or C, let 
T G L(V), and suppose that (u, Tv) = for all u, v G V. Then T = 0. 

(b) Let V be an inner product space over C, let T G L(V), and suppose 
that (u, Tu) = for all u G V. Then T = 0. 

(c) Let V be a real inner product space, let T G L(V) be Hermitian, and 
suppose that (u, Tu) = for all u G V. Then T = 0. 

Proof (a) Let u = Tv. Then, by definition of the inner product, we see that 
(Tv, Tv) = implies Tv = for all v G V which implies that T = 0. 

(b) For any u, v G V we have (by hypothesis) 

= (w + v, T(u + v)} 
= (u, Tu) + {u, Tv) + (v, Tu) + (v, Tv) 
= + {w, 7V) + (v, Tu) + 
= (w, Tv) + (v, Tu) 

Since v is arbitrary, we may replace it with iv to obtain 

= z'(u, Tv) - z'(v, Tu) . 

Dividing this by i and adding to (*) results in = (u, Tv) for any u, v G V. By 
(a), this implies that T = 0. 

(c) For any u, v G V we have (u + v, T(u + v)) = which also yields (*). 
Therefore, using (*), the fact that Tt = T, and the fact that V is real, we obtain 
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= (TV v) + (v, Tu) = (Tu, v) + (v, Tu) = (v, Tu) + (v, Tu) 
= 2{v, Tu). 

Since this holds for any u, v G V we have T = by (a). (Note that in this par- 
ticular case, Tt = T T .) I 

Exercises 

1. Suppose S, T G L(V). 

(a) If S and T are Hermitian, show that ST and TS are Hermitian if and 
only if [S, T] = ST - TS = 0. 

(b) If T is Hermitian, show that S^TS is Hermitian for all S. 

(c) If S is nonsingular and S^TS is Hermitian, show that T is Hermitian. 

2. Consider V = M n (C) with the inner product (A, B) = Tr(BtA). For each 
M G V, define the operator T M G L(V) by T M (A) = MA. Show that 
(Tm) 1 =T M t . 

3. Consider the space V = C[x]. If f = SajX 1 G V, we define the complex 
conjugate of f to be the polynomial f* = Sa^x 1 G V. In other words, if 
t G R, then f*(t) = (f(t))*. We define an inner product on V by 

{f,g) = flf*(t)g(t)dt . 

For each f G V, define the operator Tf G L(V) by Tf (g) = fg. Show that 
(T f )t = T f *. 

4. Let V be the space of all real polynomials of degree < 3, and define an 
inner product on V by 

{f,g) = J 1 Q f(x)g(x)dx . 

For any t G R, find a polynomial h t G V such that (h t , f ) = f(t) for all f G 
V. 

5. If V is as in the previous exercise and D is the usual differentiation oper- 
ator on V, find D"l\ 



6. Let V = C with the standard inner product. 

(a) Define T G L(V) by Te, = (1, -2), Te 2 = (i, -1). If v = (z, z 2 ), find 
Ttv. 
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(b) Define T E L(V) by Te, = (1 + i, 2), Te 2 = (i, i). Find the matrix rep- 
resentation of Tt relative to the usual basis for V. Is it true that [T, Tt] = 
0? 

7. Let V be a finite-dimensional inner product space and suppose T G L(V). 
Show that Im Tt = (Ker T) x 

8. Let V be a finite-dimensional inner product space, and suppose E G L(V) 
is idempotent, i.e., E 2 = E. Prove that Et = E if and only if [E, Et] = 0. 

9. For each of the following inner product spaces V and L G V*, find a 
vector u G V such that Lv = (u, v) for all v G V: 

(a) V = R 3 and L(x, y, z) = x - 2y + 4z. 

(b) V = C 2 and L(z,, z 2 ) = z, - z 2 . 

(c) V is the space of all real polynomials of degree < 2 with inner product 
as in Exercise 4, and Lf = f(0) + Df(l). (Here D is the usual differentia- 
tion operator.) 

10. (a) Let V = R 2 , and define T G L(V) by T(x, y) = (2x + y, x - 3y). Find 
Tt(3, 5). 

(b) Let V = C 2 , and define T G L(V) by T(z„ z 2 ) = (2z, + iz 2 , (1 - i)z } ). 
Find Tt(3 - i, 1 + i2). 

(c) Let V be as in Exercise 9(c), and define T G L(V) by Tf = 3f + Df. 
Find Ttf where f = 3x 2 - x + 4. 



10.2 ISOMETRIC AND UNITARY OPERATORS 

Let V be a complex inner product space with the induced norm. Another 
important class of operators U G L(V) is that for which IIUvll = II v II for all v G 
V. Such operators are called isometric because they preserve the length of the 
vector v. Furthermore, for any v, w G V we see that 

lUv-Uwl = |U(v-w)l = Iv-wl 

so that U preserves distances as well. This is sometimes described by saying 
that U is an isometry. 

If we write out the norm as an inner product and assume that the adjoint 
operator exists, we see that an isometric operator satisfies 



(v, v) = (Uv, Uv) = (v, (UtU)v) 
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and hence (v, (U+U - l)v> = for any v E V. But then from Theorem 10.4(b)) 
it follows that 

UtU = 1 . 

In fact, this is sometimes taken as the definition of an isometric operator. Note 
that this applies equally well to an infinite-dimensional space. 

If V is finite-dimensional, then (Theorems 3.21 and 5.13) it follows that 

= U" 1 , and hence 

utu = uirt = l . 

Any operator that satisfies either U^U = UU^ = 1 or U^ = U" 1 is said to be 
unitary. It is clear that a unitary operator is necessarily isometric. If V is 
simply a real space, then unitary operators are called orthogonal. 

Because of the importance of isometric and unitary operators in both 
mathematics and physics, it is worth arriving at both of these definitions from 
a slightly different viewpoint that also aids in our understanding of these 
operators. Let V be a complex vector space with an inner product defined on 
it. We say that an operator U is unitary if IIUvll = llvl for all v G V, and in 
addition, it has the property that it is a mapping of V onto itself. Since IIUvll = 
llvl, we see that Uv = if and only if v = 0, and hence Ker U = {0}. Therefore 
U is one-to-one and U" 1 exists (Theorem 5.5). Since U is surjective, the 
inverse is defined on all of V also. Note that there has been no mention of 
finite-dimensionality. This was avoided by requiring that the mapping be sur- 
jective. 

Starting from IIUvll = llvl, we may write (v, (U^U)v) = (v, v). As we did in 
the proof of Theorem 10.4, if we first substitute v = v 1 + v 2 and then v = V[ + 
zv 2 , divide the second of these equations by i and then add to the first, we find 
that (v 1; (UtU)v 2 ) = (v,, v 2 >. Since this holds for all v b v 2 G V, it follows that 
U^U = 1. If we now multiply this equation from the left by U we have UU^U 
= U, and hence (UUt)(Uv) = Uv for all v G V. But as v varies over all of V, 
so does Uv since U is surjective. We then define v' = Uv so that (UU^)v' = v' 
for all v' G V. This shows that U^U = 1 implies UU^ = 1. What we have just 
done then, is show that a surjective norm-preserving operator U has the 
property that U^U = UU^ = 1. It is important to emphasize that this approach 
is equally valid in infinite-dimensional spaces. 

We now define an isometric operator Q to be an operator defined on all of 
V with the property that IQvl = llvl for all v G V. This differs from a unitary 
operator in that we do not require that Q also be surjective. Again, the 
requirement that Q preserve the norm tells us that Q has an inverse (since it 
must be one-to-one), but this inverse is not necessarily defined on the whole 
of V. For example, let {ej be an orthonormal basis for V, and define the 
"shift operator" Q by 

Q(e,) = e i+ i . 
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This Q is clearly defined on all of V, but the image of Q is not all of V since it 
does not include the vector e^ Thus, Q" 1 is not defined on e^ 

Exactly as we did for unitary operators, we can show that QtQ = 1 for an 
isometric operator Q. If V happens to be finite-dimensional, then obviously 
QQ^ = 1. Thus, on a finite-dimensional space, an isometric operator is also 
unitary. 

Finally, let us show an interesting relationship between the inverse Q -1 of 
an isometric operator and its adjoint Qt. From QtQ = 1, we may write 
Q^(Q v) = v for every v G V. If we define Qv = v', then for every v'GImQ 
we have v = Q' [ w', and hence 



QV = Q"V forv'GImQ . 



On the other hand, if w' G (Im Q) x , then automatically (w', Qv) = for every 
v G V. Therefore this may be written as (Qtw', v) = for every v G V, and 
hence (choose v = Qt\y') 



Qt w ' = for w' G (Im Q) x 



In other words, we have 



Q = 



Q 1 on Im Q 
on (Im Q) 1 



For instance, using our earlier example of the shift operator, we see that 
(e l5 ej) = for i ^ 1, and hence ej G (Im Q) x . Therefore Q^eO = 0, so that we 
clearly can not have QQ^ = 1. 

Our next theorem summarizes some of this discussion. 



Theorem 10.5 Let V be a complex finite-dimensional inner product space. 
Then the following conditions on an operator U G L(V) are equivalent: 

(a) Ut = U _1 . 

(b) (Uv, Uw) = (v, w) for all v, w G V. 

(c) IIUvll = |vl. 

Proof (a) => (b): (Uv, Uw) = (v, (UtU)w) = (v, Iw) = (v, w). 

(b) ^ (c): IIUvll = (Uv, Uv) 1/2 = (v, v} 1/2 = II v II . 

(c) ^ (a): (v, (UtU)v) = (Uv, Uv) = (v, v) = (v, Iv), and therefore 
(v, (UtU - I)v) = 0. Hence (by Theorem 10.4(b)) we must have UtU = I, and 
thus U^ = U" 1 (since V is finite-dimensional). I 
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From part (c) of this theorem we see that U preserves the length of any 
vector. In particular, U preserves the length of a unit vector, hence the desig- 
nation "unitary." Note also that if v and w are orthogonal, then (v, w) = and 
hence (Uv, Uw) = (v, w) = 0. Thus U maintains orthogonality as well. 

Condition (b) of this theorem is sometimes described by saying that a 
unitary transformation preserves inner products. In general, we say that a 
linear transformation (i.e., a vector space homomorphism) T of an inner 
product space V onto an inner product space W (over the same field) is an 
inner product space isomorphism of V onto W if it also preserves inner 
products. Therefore, one may define a unitary operator as an inner product 
space isomorphism. 

It is also worth commenting on the case of unitary operators defined on a 
real vector space. Since in this case the adjoint reduces to the transpose, we 
have = U T = U" 1 . If V is a real vector space, then an operator T = L(V) that 
satisfies T T = T" 1 is said to be an orthogonal transformation. It should be 
clear that Theorem 10.5 also applies to real vector spaces if we replace the 
adjoint by the transpose. We will have more to say about orthogonal transfor- 
mations below. 

Theorem 10.6 Let V be finite-dimensional over C (resp. R). A linear trans- 
formation U G L(V) is unitary (resp. orthogonal) if and only if it takes an 
orthonormal basis for V into an orthonormal basis for V. 

Proof We consider the case where V is complex, leaving the real case to the 
reader. Let {e^ be an orthonormal basis for V, and assume that U is unitary. 
Then from Theorem 10.5(b) we have 

(Ue;, Uej) = (ei.e,-) = 5 ;j 

so that {Uej} is also an orthonormal set. But any orthonormal set is linearly 
independent (Theorem 2.19), and hence {Uej} forms a basis for V (since there 
are as many of the Uej as there are ej). 

Conversely, suppose that both {ej} and {Uej} are orthonormal bases for V 
and let v, w G V be arbitrary. Then 

(v, w) = (2,-v'e,-, ^jW J ej) = 1 i j v'*w J {e i , e-) = 1 i j v'*w J 6 ij 
= 2 ( .v ! '*w' . 

However, we also have 
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(Uv, uw) = (t/(2 ( Ax e j^ = A /'Mt/c,-, 
= Z l -jv , *w , <5^ = Z,V*w' = (v, w) . 

This shows that U is unitary (Theorem 10.5). I 

Corollary Let V and W be finite-dimensional inner product spaces over C. 
Then there exists an inner product space isomorphism of V onto W if and only 
if dim V = dim W. 

Proof Clearly dim V = dim W if V and W are isomorphic. On the other hand, 
let {e l5 . . . , e n } be an orthonormal basis for V, and let {e u . . . , e n } be an 
orthonormal basis for W. (These bases exist by Theorem 2.21.) We define the 
(surjective) linear transformation U by the requirement Uej = e ; . U is unique 
by Theorem 5.1. Since (Uej, Uej) = (e ; , e,) = Sjj = (e s , e,-), the proof of Theorem 
10.6 shows that U preserves inner products. In particular, we see that IIUvll = 
llvl for every v G V, and hence Ker U = {0} (by property (Nl) of Theorem 
2.17). Thus U is also one-to-one (Theorem 5.5). I 

From Theorem 10.2 we see that a complex matrix A represents a unitary 
operator relative to an orthonormal basis if and only if At = A" 1 . We therefore 
say that a complex matrix A is a unitary matrix if At = A" 1 . In the special 
case that A is a real matrix with the property that A T = A" 1 , then we say that A 
is an orthogonal matrix. (These classes of matrices were also discussed in 
Section 8.1.) The reason for this designation is shown in the next example, 
which is really nothing more than another way of looking at what we have 
done so far. 

Example 10.3 Suppose V = R n and X G V. In terms of an orthonormal basis 
{ej for V we may write X = SiX^. Now suppose we are given another 
orthonormal basis {Bj} related to the first basis by Bj = A(e s ) = S^a^ for some 
real matrix (a ;j ). Relative to this new basis we have A(X) = X = SjX'Bj where 
x 1 = SjaijX- 1 (see Section 5.4). Then 

||X|| 2 = (2; A-> ^j xJe j) = 2 ;j xV(e;, ej) = 2; j.y'.y'tia 
= ^(x 1 ) 2 = \ jtk a ij a ik x 1 x k = \ jk a T ji a ik x ] x k 
= X. k (A T A) jk x j x k . 

If A is orthogonal, then A T = A" 1 so that (A T A) jk = 6 jk and we are left with 
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IXI 2 = ^(x 1 ) 2 = 2j(x j ) 2 = IXI 2 



so that the length of X is unchanged under an orthogonal transformation. An 
equivalent way to see this is to assume that A simply represents a rotation so 
that the length of a vector remains unchanged by definition. This then forces 
A to be an orthogonal transformation (see Exercise 10.2.2). 

Another way to think of orthogonal transformations is the following. We 
saw in Section 2.4 that the angle between two vectors X, Y £ K" is defined 
by 

cosd = . 

iixii urn 

Under the orthogonal transformation A, we then have X = A(X) and also 

cosd = — — — . 

nxii urn 

But IXI = IXI and |Y| = IY|, and in addition, 

(X, Y) = (Z^e;, Zjy j ej) = 2,v'y' = Y^a^a^f 
= Z lk 6 jk x j y k =ZjX j y j =(X, Y) 

so that = (this also follows from the real vector space version of Theorem 
10.5). Therefore an orthogonal transformation also preserves the angle 

between two vectors, and hence is nothing more than a rotation in R n . / 

Theorem 10.7 The following conditions on a matrix A are equivalent: 

(a) A is unitary. 

(b) The rows A; of A form an orthonormal set. 

(c) The columns A 1 of A form an orthonormal set. 

Proof We begin by by noting that, using the usual inner product on C n , we 
have 

(AAt)ij = 2 k a ik at kj = 2 k a ik a* jk = 2 k a* jk a ik = (A,-, A;) 

and 

(AtA)ij = 2 k at lk a kj = 2 k a* ki a kj = (A\ A->) . 

Now, if A is unitary, then AA^ = I implies (AA^ = 6^ which then implies 
that (Aj, Aj) = Sjj so that (a) is equivalent to (b). Similarly, we must have 
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(AtA)jj = Sjj = (A 1 , A-i) so that (a) is also equivalent to (c). Therefore (b) must 
also be equivalent to (c). I 

Note that the equivalence of (b) and (c) in this theorem means that the 
rows of A form an orthonormal set if and only if the columns of A form an 
orthonormal set. But the rows of A are just the columns of A T , and hence A is 

unitary if and only if A T is unitary. 

It should be obvious that this theorem applies just as well to orthogonal 
matrices. Looking at this in the other direction, we see that in this case A T = 
A" 1 so that A T A = AA T = I, and therefore 

(A T A)ij = 2 k a T ik a kj = Z k a ki a kj = d tj 
(AA T )ij = ^k a ik aT kj = ^k a ik a jk = &ij ■ 

Viewing the standard (orthonormal) basis {ej for R n as row vectors, we have 
Aj = 2 a, e . and hence 

(Af, Aj) = (Z k a ik e k , Z r a jr e r ) = Z k r a ik a jr (e k , e r ) 

= ^k,r a ik a jr\r = ^k a ik a jk = &ij • 

Furthermore, it is easy to see that a similar result holds for the columns of A. 

Our next theorem details several useful properties of orthogonal and uni- 
tary matrices. 

Theorem 10.8 (a) If A is an orthogonal matrix, then det A = ±1. 

(b) If U is a unitary matrix, then |det U| = 1. Alternatively, det U = e'^fov 
some real number (|). 

Proof (a) We have AA T = I, and hence (from Theorems 4.8 and 4.1) 
1 = det I = det(AA T ) = (det A) (det A T ) = (det A) 2 

so that det A = ±1. 

(b) If UU^ = I then, as above, we have 

1 = det / = det(^ f ) = (dett/)(detC/ + ) = (detU)(dQtU T )* 
= (det£/)(det£/)* = |det£/| 2 . 
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Since the absolute value is defined to be positive, this shows that |det U| = 1 
and hence det U = e"^ for some real (|). I 

Example 10.4 Let us take a look at rotations in R as shown, for example, in 
the figure below. Recall from Example 10.3 that if we have two bases {e^ and 
{Bj}, then they are related by a transition matrix A = (a^) defined by gj = 
Sjejaji. In addition, if X = ^x'ej = ~£x%, then x 1 = ^a^x 1 . If both {ej} and {Bj} 
are orthonormal bases, then 

Bj) = (e ; , 2kCk a kj) = 2k a kj(Cn ^k) = 2k a kjSik = a ij • 

Using the usual dot product on IR as our inner product (see Section 2.4, 
Lemma 2.3) and referring to the figure below, we see that the elements a ;j are 
given by (also see Section 0.6 for the trigonometric identities)i 

a n =e l 'e l 
a n =e 1 *e 2 
a 2l =e 2 'e l 

a 22 =e 2'^2 



X, 

Thus the matrix A is given by 




cosd -sin s 
sind cos0 , 



We leave it to the reader to compute directly that A A = AA = I and det A = 
+ 1. / 



^H^cos^ = cosd 
|e 1 ||? 2 |cos(jr/2 + 0) = -sind 
|<? 2 ||^i|cos(jr/2-60 = sind 
I e* 2 1 1 1 cos 6? = cosd 




Example 10.5 Referring to the previous example, we can show that any 
(real) 2x2 orthogonal matrix with det A = +1 has the form 
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cosd -sind 



(«,;) = 



sind cos 6 



for some G R. To see this, suppose A has the form 



a b 



) 



c d 



where a, b, c, d G R. Since A is orthogonal, its rows form an orthonormal set, 
and hence we have 

a 2 + b 2 = 1, c 2 + d 2 = 1, ac + bd = 0, ad - be = 1 

where the last equation follows from det A = 1. 

If a = 0, then the first of these equations yields b = ±1, the third then yields 
d = 0, and the last yields -c = 1/b = ±1 which is equivalent to c = -b. In other 
words, if a = 0, then A has either of the following forms: 



The first of these is of the required form if we choose = -90° = -Jt/2, and 
the second is of the required form if we choose = +90° = +jt/2. 

Now suppose that a ^ 0. From the third equation we have c = -bd/a, and 

OO O 

substituting this into the second equation, we find (a + b )d = a . Using the 

O 

first equation, this becomes a = d or a = ±d. If a = -d, then the third equation 

O 

yields b = c, and hence the last equation yields -a - b = 1 which is im- 
possible. Therefore a = d, the third equation then yields c = -b, and we are left 
with 



Since det A = a 2 + c 2 = 1, there exists a real number such that a = cos and 
c = sin which gives us the desired form for A. / 

Exercises 




or 





1. Let GL(n, C) denote the subset of M n (C) consisting of all nonsingular 
matrices, U(n) the subset of all unitary matrices, and L(n) the set of all 
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nonsingular lower-triangular matrices. 

(a) Show that each of these three sets forms a group. 

(b) Show that any nonsingular n x n complex matrix can be written as a 
product of a nonsingular upper-triangular matrix and a unitary matrix. 
[Hint: Use Exercises 5.4.14 and 3.7.7.] 

2. Let V = R n with the standard inner product, and suppose the length of any 
X G V remains unchanged under A G L(V). Show that A must be an 
orthogonal transformation. 

3. Let V be the space of all continuous complex-valued functions defined on 
[0, 2jt], and define an inner product on V by 



Suppose there exists h G V such that |h(x)| = 1 for all x G [0, 2jt], and 
define Th G L(V) by Thf = hf. Prove that T is unitary. 

4. Let W be a finite-dimensional subspace of an inner product space V, and 
recall that V = W © W x (see Exercise 2.5. 1 1). Define U G L(V) by 



where w, G W and w 2 G W x . 

(a) Prove that U is a Hermitian operator. 

(b) Let V = R 3 have the standard inner product, and let W C V be spanned 
by the vector (1,0, 1). Find the matrix of U relative to the standard basis 



5. Let V be a finite-dimensional inner product space. An operator Q G L(V) 
is said to be a partial isometry if there exists a subspace W of V such that 
IQwl = llwl for all w G W, and IQwl = for all w G W x . Let Q be a partial 
isometry and suppose {wi, . . . , wj is an orthonormal basis for W. 

(a) Show that (Qu, Qv) = (u, v) for all u, v G W. [Hint: Use Exercise 
2.4.7.] 

(b) Show that {£2wi, . . . , Qw^} is an orthonormal basis for Im Q. 

(c) Show there exists an orthonormal basis {vj for V such that the first k 
columns of [Q] v form an orthonormal set, and the remaining columns are 
zero. 

(d) Let {ui, . . . , u r } be an orthonormal basis for (Im Q) x . Show that 
{Q wi, . . . , Q Wk, ui, . . . , u r } is an orthonormal basis for V. 

(e) Suppose T G L(V) satisfies T(Q w,) = w, (for 1 < i < k) and Tu, = 
(for 1 < i < r). Show that T is well-defined, and that T = Qt. 




U(W[ + w 2 ) = w, - w 2 



for V. 
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(f ) Show that is a partial isometry. 

6. Let V be a complex inner product space, and suppose H G L(V) is 
Hermitian. Show that: 

(a) llv + z'Hvll = llv - z'Hvll for all v G V. 

(b) u + z'Hu = v + z'Hv if and only if u = v. 

(c) 1 + z'H and 1 - z'H are nonsingular. 

(d) If V is finite-dimensional, then U = (1 - z'H)(l + z'H)" 1 is a unitary 
operator. (U is called the Cayley transform of H. This result is also true 
in an infinite-dimensional Hilbert space but the proof is considerably more 
difficult.) 



10.3 NORMAL OPERATORS 

We now turn our attention to characterizing the type of operator on V for 
which there exists an orthonormal basis of eigenvectors for V. We begin by 
taking a look at some rather simple properties of the eigenvalues and eigen- 
vectors of the operators we have been discussing. 

To simplify our terminology, we remark that a complex inner product 
space is also called a unitary space, while a real inner product space is some- 
times called a Euclidean space. If H is an operator such that = -H, then H 
is said to be anti- Hermitian (or skew- Hermitian). Furthermore, if P is an 
operator such that P = S^S for some operator S, then we say that P is positive 
(or positive semidefinite or nonnegative). If S also happens to be nonsingular 
(and hence P is also nonsingular), then we say that P is positive definite. Note 
that a positive operator is necessarily Hermitian since (S'i'S)''' = S^S. The rea- 
son that P is called positive is shown in part (d) of the following theorem. 

Theorem 10.9 (a) The eigenvalues of a Hermitian operator are real. 

(b) The eigenvalues of an isometry (and hence also of a unitary transfor- 
mation) have absolute value one. 

(c) The eigenvalues of an anti-Hermitian operator are pure imaginary. 

(d) A positive (positive definite) operator has eigenvalues that are real and 
nonnegative (positive). 

Proof (a) If H is Hermitian, v ^ 0, and Hv = Xv, we have 

A(v, v) = (v, Av} = (v, Hv) = [H%, v) = {Hv, v) 
= (Av, v) = A*(v, v) . 

But (v, v) ^ 0, and hence X = X*. 



10.3 NORMAL OPERATORS 



509 



(b) If Q is an isometry, v ^ 0, and Qv = Xv, then we have (using Theorem 
2.17) 

llvll = IQvl = IXvl = |X| Ivl . 

But llvll ^0, and hence |X| = 1. 

(c) If Ht = -H, v * 0, and Hv = Xv, then 

A(v, v) = (v, Av) = (v, Hv) = [H%, v) = ( - Hv, v) = ( - Av, v) 
= -A*(v, v) . 

But (v, v) ^ 0, and hence X = -X*. This shows that X is pure imaginary. 

(d) Let P = S^S be a positive definite operator. If v ^ 0, then the fact that 
S is nonsingular means that Sv ^ 0, and hence (Sv, Sv) = II S v II 2 > 0. Then, for 
Pv = (StS)v = Xv, we see that 

X(v, v) = (v, Xv) = (v, Pv) = (v, (StS)v) = (Sv, Sv) . 

But (v, v) = llvll 2 > also, and therefore we must have X > 0. 

If P is positive, then S is singular and the only difference is that now for 
v * we have (Sv, Sv) = IISvll 2 > which implies that X > 0. I 

We say that an operator N is normal if N^N = NNt. Note this implies that 
for any v G V we have 

IIAMI 2 = [Nv, Nv) = {(N^N)v, v) = ((AW f )v, v) = (/Y f v, /Y f v) 
= HA^ t vll 2 . 

Now let X be a complex number. It is easy to see that if N is normal then so is 
N - XI since (from Theorem 10.3) 

(N - kl)\N - Al) = (/Y f - X*l)(N - Al) = N*N - A/Y t - X*N + A * Al 
= (A^-AlXA ?t -A*l) = (/y-Al)(/y-Al) t . 

Using N - XI instead of N in the previous result we obtain 

INv-Xvl 2 = INtv - X*vll 2 . 

Since the norm is positive definite, this equation proves the next theorem. 
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Theorem 10.10 Let N be a normal operator and let X be an eigenvalue of N. 
Then Nv = Xv if and only if N^v = X*v. 

In words, if v is an eigenvector of a normal operator N with eigenvalue X, 
then v is also an eigenvector of Nt with eigenvalue X*. (Note it is always true 
that if X is an eigenvalue of an operator T, then X* will be an eigenvalue of T+. 
See Exercise 10.3.6.) 

Corollary If N is normal and Nv = for some v G V, then N^v = 0. 

Proof This follows from Theorem 10.10 by taking X = X* = 0. Alternatively, 
using N^N = NNt along with the fact that Nv = 0, we see that 

(Ntv, Ntv) = (v, (NNt)v) = (v, (NtN)v) = . 

Since the inner product is positive definite, this requires that N^v = 0. I 

Theorem 10.11 (a) Eigenvectors belonging to distinct eigenvalues of a 
Hermitian operator are orthogonal. 

(b) Eigenvectors belonging to distinct eigenvalues of an isometric opera- 
tor are orthogonal. Hence the eigenvectors of a unitary operator are orthogo- 
nal. 

(c) Eigenvectors belonging to distinct eigenvalues of a normal operator 
are orthogonal. 

Proof As we note after the proof, Hermitian and unitary operators are special 
cases of normal operators, and hence parts (a) and (b) follow from part (c). 
However, it is instructive to give independent proofs of parts (a) and (b). 
Assume that T is an operator on a unitary space, and Tv s = XjVj for i = 1, 2 
with X, ^ X 2 . We may then also assume without loss of generality that X[ ^ 0. 

(a) If T = Tt, then (using Theorem 10.9(a)) 

A 2 (v l5 v 2 > = (vj, A 2 v 2 ) = (v ls Tv 2 ) = (r f v l5 v 2 ) = {Tv t , v 2 ) 
= (Vj, v 2 > = A^Vj, v 2 > = Ajivj, v 2 ) . 

But X, ^ X 2 , and hence (v,, v 2 > = 0. 

(b) If T is isometric, then T^T = 1 and we have 

(Vi, v 2 ) = (v„ (TtT)v 2 ) = (Tv„ Tv 2 ) = X,*X 2 (v„ v 2 ) . 

But by Theorem 10.9(b) we have IXJ 2 = X^Xj = 1, and thus V = 1/X,. 
Therefore, multiplying the above equation by X b we see that X,(Vi, v 2 > = 
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^(Vi, v 2 ) and hence, since ^ * K 2 , this shows that (v l5 v 2 ) = 0. 
(c) If T is normal, then 

(Vi, Tv 2 ) = X 2 (v„ v 2 > 
while on the other hand, using Theorem 10.10 we have 

(Vl Tv 2 ) = (Tt v „ v 2 > = {kfvu v 2 > = ^(Vi, v 2 > . 
Thus (vi, v 2 > = since X, ^ X 2 . I 

We note that if Ht = H, then HtH = HH = HHt so that any Hermitian 
operator is normal. Furthermore, if U is unitary, then U^U = UU^ ( = 1) so 
that U is also normal. 

A Hermitian operator T defined on a real inner product space is said to be 
symmetric. This is equivalent to requiring that with respect to an orthonormal 
basis, the matrix elements a ;j of T are given by 

a ;j = (ei.Te,-) = (Te i5 e,-) = (e^Te;) = a^ . 

Therefore, a symmetric operator is represented by a real symmetric matrix. It 
is also true that antisymmetric operators (i.e., T T = -T) and anti-Hermitian 
operators (H^ = -H) are normal. Therefore, part (a) and the unitary case in 
part (b) in the above theorem are really special cases of part (c). 

Theorem 10.12 (a) Let T be an operator on a unitary space V, and let W be 
a T-invariant subspace of V. Then W x is invariant under T^. 

(b) Let U be a unitary operator on a unitary space V, and let W be a U- 
invariant subspace of V. Then W x is also invariant under U. 

Proof (a) For any v G W we have Tv G W since W is T-invariant. Let w G 
W x be arbitrary. We must show that T%G W x . But this is easy because 

(Tt w , v) = (w, Tv) = 

by definition of W x . Thus T^w G W x so that W x is invariant under Tt. 

(b) The fact that U is unitary means U" 1 = exists, and hence U is non- 
singular. In other words, for any v' G W there exists v G W such that Uv = v'. 
Now let w G W x be arbitrary. Then 

(Uw, v'> = (Uw, Uv) = (w, (UtU)v) = (w, v) = 

by definition of W x . Thus Uw G W x so that W x is invariant under U. I 
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Recall from the discussion in Section 7.7 that the algebraic multiplicity of 
a given eigenvalue is the number of times the eigenvalue is repeated as a root 
of the characteristic polynomial. We also defined the geometric multiplicity as 
the number of linearly independent eigenvectors corresponding to this eigen- 
value (i.e., the dimension of its eigenspace). 

Theorem 10.13 Let H be a Hermitian operator on a finite-dimensional 
unitary space V. Then the algebraic multiplicity of any eigenvalue X of H is 
equal to its geometric multiplicity. 

Proof Let = {v G V: Hv = Xv} be the eigenspace corresponding to the 
eigenvalue X. Furthermore, "V\ is obviously invariant under H since Hv = 
Xv G for every v G V^. By Theorem 10.12(a), we then have that Vx, x is 
also invariant under = H. Furthermore, by Theorem 2.22 we see that V = 
Vx © Vx x . Applying Theorem 7.20, we may write H = F^ © H 2 where F^ = 
H|V X and H 2 = H|V^. 

Let A be the matrix representation of H, and let A; be the matrix represen- 
tation of Hj (i = 1, 2). By Theorem 7.20, we also have A = A, © A 2 . Using 
Theorem 4. 14, it then follows that the characteristic polynomial of A is given 
by 

det(xI-A) = det(xl - AO det(xl - A 2 ) . 

Now, H! is a Hermitian operator on the finite-dimensional space Y\ with only 
the single eigenvalue X. Therefore X is the only root of det(xl - A,) = 0, and 
hence it must occur with an algebraic multiplicity equal to the dimension of 
(since this is just the size of the matrix A^. In other words, if dim "V\ = m, 

then det(xl - A[) = (x - X m ). On the other hand, X is not an eigenvalue of A 2 
by definition, and hence det(xl - A 2 ) ^ 0. This means that det(xl - A) contains 
(x - X) as a factor exactly m times. I 

Corollary Any Hermitian operator H on a finite-dimensional unitary space 
V is diagonalizable. 

Proof Since V is a unitary space, the characteristic polynomial of H will 
factor into (not necessarily distinct) linear terms. The conclusion then follows 
from Theorems 10.13 and 7.26. I 

In fact, from Theorem 8.2 we know that any normal matrix is unitarily 
similar to a diagonal matrix. This means that given any normal operator T G 
L(V), there is an orthonormal basis for V that consists of eigenvectors of T. 
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We develop this result from an entirely different point of view in the next 
section. 



Exercises 

1. Let V be a unitary space and suppose T E L(V). Define T + = (1/2)(T + Tt) 
andT_ = (l/20(T-Tt). 

(a) Show that T + and T_ are Hermitian, and that T = T + + z'T_. 

(b) If T|' + and T_ are Hermitian operators such that T = T'+ + zT'_, show 
that T' + = T + and T'_ = T_. 

(c) Prove that T is normal if and only if [T+, T_] = 0. 

2. Let N be a normal operator on a finite-dimensional inner product space V. 
Prove Ker N = Ker Nt and Im N = Im Nt. [Hint: Prove that (Im Nt) x = 
Ker N, and hence that Im Nt = (Ker N) x .] 

3. Let V be a finite-dimensional inner product space, and suppose T G L(V) 
is both positive and unitary. Prove that T = 1. 

4. Let H G M n (C) be Hermitian. Then for any nonzero x G <C n we define the 
Rayleigh quotient to be the number 

R(x) . 
bll 2 

Prove that max{R(x): x ^ 0} is the largest eigenvalue of H, and that 
min{R(x): x ^ 0} is the smallest eigenvalue of H. 

5. Let V be a finite-dimensional unitary space, and suppose E G L(V) is such 
that E 2 = E = Et. Prove that V = Im E © (Im E) x . 

6. If V is finite-dimensional and T G L(V) has eigenvalue X, show that Tt 
has eigenvalue X*. 

10.4 DIAGONALIZATION OF NORMAL OPERATORS 

We now turn to the problem of diagonalizing operators. We will discuss sev- 
eral of the many ways to approach this problem. Because most commonly 
used operators are normal, we first treat this general case in detail, leaving 
unitary and Hermitian operators as obvious special cases. Next, we go back 
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and consider the real and complex cases separately. In so doing, we will gain 
much insight into the structure of orthogonal and unitary transformations. 
While this problem was treated concisely in Chapter 8, we present an entirely 
different viewpoint in this section to acquaint the reader with other approaches 
found in the literature. If the reader has studied Chapter 8, he or she should 
keep in mind the rational and Jordan forms while reading this section, as many 
of our results (such as Theorem 10.16) follow almost trivially from our earlier 
work. We begin with some more elementary facts about normal transforma- 
tions. 

Theorem 10.14 Let V be a unitary space. 

(a) If T G L(V) and (TtT)v = for some v E V, then Tv = 0. 

(b) If H is Hermitian and H k v = for k > 1, then Hv = 0. 

(c) If N is normal and N k v = for k > 1, then Nv = 0. 

(d) If N is normal, and if (N - Xl) k v = where k > 1 and X G C, then 
Nv = Xv. 

Proof (a) Since (T^T)v = 0, we have = (v, (T^T)v) = (Tv, Tv) which 
implies that Tv = because the inner product is positive definite. 

(b) We first show that if H v = for some positive integer m, then Hv = 
0. To see this, let T = H 2 ™ 1 and note that T^ = T because H is Hermitian (by 
induction from Theorem 10.3(c)). Then T+T = TT = H 2 ™, and hence 

= (H 2m v, v) = ((TtT)v,v) = (Tv,Tv) 

which implies that = Tv = H 2 ™ v. Repeating this process, we must eventu- 
ally obtain Hv = 0. 

Now, if H k v = 0, then H 2m v = for any 2 m > k, and therefore applying the 
above argument, we see that Hv = 0. 

(c) Define the Hermitian operator H = N^N. Since N is normal, we see 

that 

(NtN) 2 = NtNNtN = Nt 2 N 2 

and by induction, 

(NtN) k = Nt k N k . 
By hypothesis, we then find that 



H k v = (NtN) k v = (Nt k N k )v = 



10.4 DIAGONALIZATION OF NORMAL OPERATORS 515 



and hence (NtN)v = Hv = by part (b). But then Nv = by part (a). 

(d) Since N is normal, it follows that N - XI is normal, and therefore by 
part (c) we have (N - XI )v = 0. I 

Just as we did for operators, we say that a matrix N is normal if N^N = 
NNt. We now wish to show that any normal matrix can be diagonalized by a 
unitary similarity transformation. Another way to phrase this is as follows. We 
say that two matrices A, B G M n (C) are unitarily similar (or equivalent) if 
there exists a unitary matrix U E M n (C) such that A = TJtBU = IT'BU. Thus, 
we wish to show that any normal matrix is unitarily similar to a diagonal 
matrix. This extremely important result is quite easy to prove with what has 
already been shown. Let us first prove this in the case of normal operators 
over the complex field. (See Theorem 8.2 for another approach.) 

Theorem 10.15 Let N be a normal operator on a finite-dimensional unitary 
space V. Then there exists an orthonormal basis for V consisting of eigenvec- 
tors of N in which the matrix of N is diagonal. 

Proof Let X b . . . , X r be the distinct eigenvalues of the normal operator N. 
(These all exist in C by Theorems 6.12 and 6.13.) Then (by Theorem 7.13) the 
minimal polynomial m(x) for N must be of the form 

m(x) = (x - X,)" 1 • • • (x - X r )"r 

where each n s > 1. By the primary decomposition theorem (Theorem 7.23), we 
can write V = W, © • • • © W r where Wj = Ker(N - X l) ni . In other words, 

(N-X,l) n 'v, = 

for every Vj E Wj. By Theorem 10.14(d), we then have Nv s = XjVj so that 
every Vj E Wj is an eigenvector of N with eigenvalue X ; . 

Now, the inner product on V induces an inner product on each subspace 
Wj in the usual and obvious way, and thus by the Gram-Schmidt process 
(Theorem 2.21), each Wj has an orthonormal basis relative to this induced 
inner product. Note that by the last result of the previous paragraph, this basis 
must consist of eigenvectors of N. 

By Theorem 10.11(c), vectors in distinct W s are orthogonal to each other. 
Therefore, according to Theorem 2.15, the union of the bases of the W s forms 
a basis for V, which thus consists entirely of eigenvectors of N. By Theorem 
7.14 then, the matrix of N is diagonal in this basis. (Alternatively, we see that 
the matrix elements n^ of N relative to the eigenvector basis {ej are given by 
njj = (e ; , Nej) = (e;, X^} = X^-.) I 
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Corollary 1 Let N be a normal matrix over C. Then there exists a unitary 
matrix U such that U"'NU = U^NU is diagonal. Moreover, the columns of U 
are just the eigenvectors of N, and the diagonal elements of U^NU are the 
eigenvalues of N. 

Proof The normal matrix N defines an operator on a finite-dimensional 
unitary space V with the standard orthonormal basis, and therefore by 
Theorem 10.15, V has an orthonormal basis of eigenvectors in which the 
matrix N is diagonal. By Theorem 10.6, any such change of basis in V is 
accomplished by a unitary transformation U, and by Theorem 5.18, the matrix 
of the operator relative to this new basis is related to the matrix N in the old 
basis by the similarity transformation IT'NU (= U^NU). 

Now note that the columns of U are precisely the eigenvectors of N (see 
the discussion preceding Example 7.4). We also recall that Theorem 7.14 tells 
us that the diagonal elements of the diagonal form of N are exactly the eigen- 
values of N. I 

Corollary 2 A real symmetric matrix can be diagonalized by an orthogonal 
matrix. 

Proof Note that a real symmetric matrix A may be considered as an operator 
on a finite-dimensional real inner product space V. If we think of A as a com- 
plex matrix that happens to have all real elements, then A is Hermitian and 
hence has all real eigenvalues. This means that all the roots of the minimal 
polynomial for A lie in IR. If X u . . . , A, r are the distinct eigenvalues of A, then 
we may proceed exactly as in the proof of Theorem 10.15 and Corollary 1 to 
conclude that there exists a unitary matrix U that diagonalizes A. In this case, 

since W s = Ker(A - Xj I) ni and A - X I is real, it follows that the eigenvectors 
of A are real and hence U is actually an orthogonal matrix. I 

Corollary 2 is also proved from an entirely different point of view in 
Exercise 10.4.9. This alternative approach has the advantage of presenting a 
very useful geometric picture of the diagonalization process. 

Example 10.6 Let us diagonalize the real symmetric matrix 



A = 




The characteristic polynomial of A is 
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A A (x) = det(xl - A) = (x - 2)(x - 5) - 4 = (x - l)(x - 6) 

and therefore the eigenvalues of A are 1 and 6. To find the eigenvectors of A, 
we must solve the matrix equation (kj - A)Vj = for the vector v ; . For ^ = 1 
we have V[ = (x l5 yO, and hence we find the homogeneous system of equations 

-jCj + 2y l = 
2x t - 4 y 1 = . 



These imply that x, = 2y u and hence a nonzero solution is v, = (2, 1). For X 2 = 
6 we have the equations 

4x 2 +2y 2 =0 

2*2 + yi = 



which yields v 2 = (1, -2). 

Note that (v b v 2 ) = as it should according to Theorem 10.11, and that 
IvJI = V5~ = llv 2 l. We then take the normalized basis vectors to be e s = \J^l5 
which are also eigenvectors of A. Finally, A is diagonalized by the orthogonal 
matrix P whose columns are just the e ; : 



2/S l/S 
l/S -2/S, 



We leave it to the reader to show that 



P T AP = 



(l Q\ 
6 



Another important point to notice is that Theorem 10.15 tells us that even 
though an eigenvalue A. of a normal operator N may be degenerate (i.e., have 
algebraic multiplicity k > 1), it is always possible to find k linearly indepen- 
dent eigenvectors belonging to X. The easiest way to see this is to note that 
from Theorem 10.8 we have |det U| = 1 ^ for any unitary matrix U. This 
means that the columns of the diagonalizing matrix U (which are just the 
eigenvectors of N) must be linearly independent. This is in fact another proof 
that the algebraic and geometric multiplicities of a normal (and hence 
Hermitian) operator must be the same. 

We now consider the case of real orthogonal transformations as indepen- 
dent operators, not as a special case of normal operators. First we need a gen- 
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eral definition. Let V be an arbitrary finite-dimensional vector space over any 
field J, and suppose T £ L(V). A nonzero T-invariant subspace W C V is said 
to be irreducible if the only T-invariant subspaces contained in W are {0} 
and W. 

Theorem 10.16 (a) Let V be a finite-dimensional vector space over an 
algebraically closed field J, and suppose T £ L(V). Then every irreducible T- 
invariant subspace W of V is of dimension 1. 

(b) Let V be a finite-dimensional vector space over R, and suppose T £ 
L(V). Then every irreducible T-invariant subspace W of V is of dimension 
either 1 or 2. 

Proof (a) Let W be an irreducible T-invariant subspace of V. Then the 
restriction Tw of T to W is just a linear transformation on W, where Tw(w) = 
Tw £ W for every w £ W. Since J is algebraically closed, the characteristic 
polynomial of Tw has at least one root (i.e., eigenvalue X) in Therefore T 
has at least one (nonzero) eigenvector v £ W such that Tv = Xv £ W. If we 
define J>(v) to be the linear span of {v}, then J>(v) is also a T-invariant sub- 
space of W, and hence J>(v) = W because W is irreducible. Therefore W is 
spanned by the single vector v, and hence dim W = 1 . 

(b) Let W be an irreducible T-invariant subspace of V, and let m(x) be the 
minimal polynomial for Tw- Therefore, the fact that W is irreducible (so that 
W is not a direct sum of T-invariant subspaces) along with the primary 
decomposition theorem (Theorem 7.23) tells us that we must have m(x) = 
f(x) n where f(x) £ R[x] is a prime polynomial. Furthermore, if n were greater 
than 1, then we claim that Ker f(T) n_1 would be a T-invariant subspace of W 
(Theorem 7.18) that is different from {0} and W. 

To see this, first suppose that Ker f(T) n_1 = {0}. Then the linear transfor- 
mation f(T) n_1 is one-to-one, and hence f(T) n_1 (W) = W. But then 

= f(T) n (W) = f(T)f(T) n " 1 (W) = f(T)(W) . 

However, f(T)W ^ by definition of m(x), and hence this contradiction shows 
that we can not have Ker f(T) n_1 = {0}. Next, if we had Ker f(T) n_1 = W, this 

would imply that f(T) n_1 (W) = which contradicts the definition of minimal 
polynomial. Therefore we must have n = 1 and m(x) = f(x). 

Since m(x) = f(x) is prime, it follows from the corollary to Theorem 6.15 
that we must have either m(x) = x - a or m(x) = x 2 + ax + b with a 2 - 4ab < 0. 
If m(x) = x - a, then there exists an eigenvector v £ W with Tv = av £ W, and 

hence J>(v) = W as in part (a). If m(x) = x 2 + ax + b, then for any nonzero w £ 
W we have 
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= m(T)w = T 2 w + aTw + bw 

and hence 

T 2 w = T(Tw) = -aTw - bw G W . 

Thus J>(w, Tw) is a T-invariant subspace of W with dimension either 1 or 2. 
However W is irreducible, and therefore we must have W = J>(w, Tw). I 

Theorem 10.17 Let V be a finite-dimensional Euclidean space, let T G L(V) 
be an orthogonal transformation, and let W be an irreducible T-invariant sub- 
space of V. Then one of the following two conditions holds: 

(a) dim W = 1 , and for any nonzero w G W we have Tw = ±w. 

(b) dim W = 2, and there exists an orthonormal basis {e b e 2 } for W such 
that the matrix representation of Tw relative to this basis has the form 

cosd -s'md\ 
sind cosd) 

Proof That dim W equals 1 or 2 follows from Theorem 10.16(b). If dim W = 
1, then (since W is T-invariant) there exists X G R such that Tw = Xw for any 
(fixed) w G W. But T is orthogonal so that 

II w II = IITwII = IIXwII = |X| II w II 

and hence |X| = 1 . This shows that Tw = Xw = ±w. 

If dim W = 2, then the desired form of the matrix of Tw follows essen- 
tially from Example 10.5. Alternatively, we know that W has an orthonormal 
basis {e,, e 2 } by the Gram-Schmidt process. If we write Te, = ae, + be 2 , then 
II Te x II = lie] II = 1 implies that a 2 + b 2 = 1. If we also write Te 2 = ce l + de 2 , then 
similarly c + d = 1. Using (Te b Te 2 ) = (e l5 e 2 ) = we find ac + bd = 0, and 
hence c = -bd/a. But then 1 = d 2 (l + b 2 /a 2 ) = d 2 /a 2 so that a 2 = d 2 and c 2 = b 2 . 
This means that Te 2 = ±(-be, + ae 2 ). If Te 2 = -be, + ae 2 , then the matrix of T 
is of the form 

'a -b\ 
K b a) 

and we may choose G IR such that a = cos and b = sin (since det T = a 2 + 
b 2 = 1). However, if Te 2 = be, - ae 2 , then the matrix of T is 



'a b\ 
K b -a) 
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which satisfies the equation x 2 - 1 = (x - l)(x + 1) = (and has det T = -1). 
But if T satisfied this equation, then (by the primary decomposition theorem 
(Theorem 7.23)) W would be a direct sum of subspaces, in contradiction to 
the assumed irreducibility of W. Therefore only the first case can occur. I 

This theorem becomes quite useful when combined with the next result. 

Theorem 10.18 Let T be an orthogonal operator on a finite-dimensional 
Euclidean space V. Then V = W, © • • • © W r where each Wj is an irreducible 
T-invariant subspace of V such that vectors belonging to distinct subspaces 
Wj and Wj are orthogonal. 

Proof If dim V = 1 there is nothing to prove, so we assume dim V > 1 and 
that the theorem is true for all spaces of dimension less than dim V. Let W[ be 
a nonzero T-invariant subspace of least dimension. Then is necessarily 
irreducible. By Theorem 2.22 we know that V = W, © W x where dim Wf 1 " < 
dim V, and hence we need only show that Wf 1 " is also T-invariant. But this 
follows from Theorem 10.12(b) applied to real unitary transformations (i.e., 
orthogonal transformations). This also means that T(W, X ) C WfS and hence 
T is an orthogonal transformation on W, x (since it takes vectors in Wf 1 " to 
vectors in W/"). By induction, W, x is a direct sum of pairwise orthogonal 
irreducible T-invariant subspaces, and therefore so is V = W[ © W^. I 

From Theorem 10.18, we see that if we are given an orthogonal transfor- 
mation T on a finite-dimensional Euclidean space V, then V = W, © • • • © W r 
is the direct sum of pairwise orthogonal irreducible T-invariant subspaces W s . 
But from Theorem 10.17, we see that any such subspace Wj is of dimension 
either 1 or 2. Moreover, Theorem 10.17 also showed that if dim Wj = 1, then 
the matrix of T|Wj is either (1) or (-1), and if dim Wj = 2, then the matrix of 
T|Wj is just the rotation matrix R s given by 



Since each Wj has an orthonormal basis and the bases of distinct Wj are 
orthogonal, it follows that we can find an orthonormal basis for V in which 
the matrix of T takes the block diagonal form (see Theorem 7.20) 




(1) © • • • © (1) © (-1) © • • • © (-1) © R, © • • • © R 



m 
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These observations prove the next theorem. 

Theorem 10.19 Let T be an orthogonal transformation on a finite-dimen- 
sional Euclidean space V. Then there exists an orthonormal basis for V in 
which the matrix representation of T takes the block diagonal form 

Mi © • • • © M r 

where each Mj is one of the following: (+1), (-1), or R s . 

Exercises 

1. Prove that any nilpotent normal operator is necessarily the zero operator. 

2. Let A and B be normal operators on a finite-dimensional unitary space V. 
For notational simplicity, let v a denote an eigenvector of A corresponding 
to the eigenvalue a, let Vb be an eigenvector of B corresponding to the 
eigenvalue b, and let v a b denote a simultaneous eigenvector of A and B, 
i.e., Av ab = av ab and Bv ab = bv ab . 

(a) If there exists a basis for V consisting of simultaneous eigenvectors of 
A and B, show that the commutator [A, B] = AB - BA = 0. 

(b) If [A, B] = 0, show that there exists a basis for V consisting entirely of 
simultaneous eigenvectors of A and B. In other words, if [A, B] = 0, then 
A and B can be simultaneously diagonalized. [Hint: There are several 
ways to approach this problem. One way follows easily from Exercise 
8.1.3. Another intuitive method is as follows. First assume that at least one 
of the operators, say A, is nondegenerate. Show that Bv a is an eigenvector 
of A, and that Bv a = bv a for some scalar b. Next assume that both A and B 
are degenerate. Then Av a i = av a i where the v a i (i = 1, . . . , m a ) are 
linearly independent eigenvectors corresponding to the eigenvalue a of 
multiplicity m a . What does the matrix representation of A look like in the 
{v a i} basis? Again consider Bv a i. What does the matrix representation of 
B look like? Now what happens if you diagonalize B?] 

3. If N! and N 2 are commuting normal operators, show that the product N[N 2 
is normal. 

4. Let V be a finite-dimensional complex (real) inner product space, and 
suppose T G L(V). Prove that V has an orthonormal basis of eigenvectors 
of T with corresponding eigenvalues of absolute value 1 if and only if T is 
unitary (Hermitian and orthogonal). 
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For each of the following matrices A, find an orthogonal or unitary matrix 
P and a diagonal matrix D such that P^AP = D: 





(c) 



' 2 3 - /3> 
K 3 + i3 5 j 



id) 



(0 2 2^ 
2 2 
v 2 2 



(e) 



(2 1 IN 
1 2 1 
v l 1 2 



6. Let A, B and C be normal operators on a finite-dimensional unitary space, 
and assume that [A, B] = but [B, C] ^ 0. If all of these operators are 
nondegenerate (i.e., all eigenvalues have multiplicity equal to 1), is it true 
that [A, C] ^ 0? Explain. What if any of these are degenerate? 



7. Let V be a finite-dimensional unitary space and suppose A G L(V). 

(a) Prove that Tr(AAt) = if and only if A = 0. 

(b) Suppose N G L(V) is normal and AN = NA. Prove that ANt = Nt A. 

8. Let A be a positive definite real symmetric matrix on an n-dimensional 
Euclidean space V. Using the single variable formula (where a > 0) 

J_°° oo exp(-ax 2 /2)c?x = (2jr/a) 1/2 

show that 

J°° o= exp[(-l/2)(x, Ax)]d n x = (2jr)" /2 (det A)" 1/2 



where d n x = dxi • • • dx n . [Hint: First consider the case where A is 
diagonal.] 

9. (This is an independent proof of Corollary 2 of Theorem 10.15.) Let A = 
(ajj) G M3(R) be a real symmetric matrix. Thus A: R 3 — * R 3 is a Hermitian 
linear operator with respect to the inner product ( , ). Prove there exists an 
orthonormal basis of eigenvectors of A using the following approach. (It 
should be clear after you have done this that the same proof will work in 

R n just as well.) 

(a) Let S 2 be the unit sphere in R 3 , and define f: S 2 -*■ R by 



f(x) = (Ax, x) 
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Let M = sup f(x) and m = inf f(x) where the sup and inf are taken over S 2 . 
Show that there exist points xi, xi' G S 2 such that f(xi) = M and f(xi') = m. 
[Hint: Use Theorem A15.] 

(b) Let C = x(t) be any curve on S 2 such that x(0) = xi, and let a dot 
denote differentiation with respect to the parameter t. Note that x(t) is 
tangent to C, and hence also to S . Show that (Axj, x(0)) = 0, and thus 
deduce that Axi is normal to the tangent plane at xi. [Hint: Consider 
df(x(t))/dt| t =o and note that C is arbitrary.] 

(c) Show that (x(t), x(t)> = 0, and hence conclude that Axi = A4X1. [Hint: 
Recall that S is the unit sphere.] 

(d) Argue that Axi' = Xi'xi', and in general, that any critical point of f(x) = 
(Ax, x) on the unit sphere will be an eigenvector of A with critical value 
(i.e., eigenvalue) X; = (Ax;, x;>. (A critical point of f(x) is a point xo where 
df/dx = 0, and the critical value of f is just f(xo).) 

(e) Let [xj] be the 1 -dimensional subspace of R 3 spanned by xi. Show that 
[xj and [xi] x are both A-invariant subspaces of R 3 , and hence that A is 
Hermitian on [xi] x C R 2 . Note that [xi] x is a plane through the origin of 
S 2 . 

(f ) Show that f now must achieve its maximum at a point X2 on the unit 
circle S 1 C [xi] x , and that Ax2 = X2X2 with X2 < ~k\. 

(g) Repeat this process again by considering the space [X2] x C [xi] x , and 
show there exists a vector X3 G [X2] x with AX3 = X3X3 and X3 < X2 < X4. 

10.5 THE SPECTRAL THEOREM 

We now turn to another major topic of this chapter, the so-called spectral 
theorem. This important result is actually nothing more than another way of 
looking at Theorems 8.2 and 10.15. We begin with a simple version that is 
easy to understand and visualize if the reader will refer back to the discussion 
prior to Theorem 7.29. 

Theorem 10.20 Suppose A G M n (C) is a diagonalizable matrix with distinct 
eigenvalues X l5 . . . , X, r . Then A can be written in the form 

A = X-iEj + • • • + X r E r 

where the Ej are n x n matrices with the following properties: 

(a) Each E s is idempotent (i.e., E 2 = Ej). 

(b) E,Ej =0fori*j. 
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(c) Ej + • • • + E r = I. 

(d) AEj = E ; A for every E s . 

Proof Since A is diagonalizable by assumption, let D = P'AP be the diago- 
nal form of A for some nonsingular matrix P (whose columns are just the 
eigenvectors of A). Remember that the diagonal elements of D are just the 
eigenvalues Xj of A. Let P s be the n x n diagonal matrix with diagonal element 
1 wherever a Xj occurs in D, and O's everywhere else. It should be clear that 
the collection {P ; } obeys properties (a) - (c), and that 

P-'AP = D = X,P, + • • • + X r P r . 

If we now define E ; = PPjP 1 , then we have 

A = PDP" 1 = X,E, + • • • + X r E r 

where the E s also obey properties (a) - (c) by virtue of the fact that the P s do. 
Using (a) and (b) in this last equation we find 

AEj = (XjE, + • • • + X r E r )Ej = XjEj 

and similarly it follows that EA = XjEj so that each E ; commutes with A, i.e., 
E,A = AE. I 

By way of terminology, the collection of eigenvalues Xi, . . . , X r is called 
the spectrum of A, the sum Ei + • • • + E r = I is called the resolution of the 
identity induced by A, and the expression A = A^E, + • • • + X r E r is called the 
spectral decomposition of A. These definitions also apply to arbitrary normal 
operators as in Theorem 10.22 below. 

Corollary Let A be diagonalizable with spectral decomposition as in 
Theorem 10.20. If f(x) G C[x] is any polynomial, then 

f(A) = f(X,)E, + ■ • • + f(X r )E r . 

Proof Using properties (a) - (c) in Theorem 10.20, it is easy to see that for 
any m > we have 

A m = X 1 m E 1 + - • • +X r m E r . 
The result for arbitrary polynomials now follows easily from this result. I 
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Before turning to our proof of the spectral theorem, we first prove a simple 
but useful characterization of orthogonal projections. 

Theorem 10.21 Let V be an inner product space and suppose E G L(V). 
Then E is an orthogonal projection if and only if E 2 = E = E^. 

Proof We first assume that E is an orthogonal projection. By definition this 
means that E 2 = E, and hence we must show that E^ = E. From Theorem 7.27 
we know that V = Im E © Ker E = Im E © (Im E) x . Suppose v, w G V are 
arbitrary. Then we may write v = v, + v 2 and w = w, + w 2 where v b w, GImE 
and v 2 , w 2 G (Im E) x . Therefore 

(v, Ew) = (v, + v 2 , Wi) = (Vi, w,) + (v 2 , Wi) = (Vi, Wi) 

and 

(v, Etw) = (Ev, w) = (vi, Wj + w 2 > = {vi, Wi) + (Vi, w 2 > = (v„ w,) . 

In other words, (v, (E - E^)w> = for all v, w G V, and hence E = (by 
Theorem 10.4(a)). 

On the other hand, if E 2 = E = E^, then we know from Theorem 7.27 that 
E is a projection of V on Im E in the direction of Ker E, i.e., V = Im E © 
Ker E. Therefore, we need only show that Im E and Ker E are orthogonal 
subspaces. To show this, let w G Im E and w' G Ker E be arbitrary. Then 
Ew = w and Ew' = so that 

(w', w) = (w', Ew) = (Etw', w) = (Ew', w) = . 

(This was also proved independently in Exercise 10.3.5.) I 

We are now in a position to prove the spectral theorem for normal opera- 
tors. In order to distinguish projection operators from their matrix representa- 
tions in this theorem, we denote the operators by Jtj and the corresponding 
matrices by E s . 

Theorem 10.22 (Spectral Theorem for Normal Operators) Let V be a 

finite-dimensional unitary space, and let N be a normal operator on V with 
distinct eigenvalues X u . . . , X, r . Then 

(a) N = X, Jtj + • • • + X r Jt r where each Jtj is the orthogonal projection of V 
onto a subspace W ; = Im Jt ; . 

(b) JtjJtj = for i ^ j. 
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(c) Jti + • • • + 7t T = 1. 

(d) V = W, © • • • © W r where the subspaces Wj are mutually orthogonal. 

(e) Wj = Im jtj = Ker(N - is the eigenspace corresponding to Xj. 

Proof Choose any orthonormal basis {ei} for V, and let A be the matrix rep- 
resentation of N relative to this basis. As discussed following Theorem 7.6, 
the normal matrix A has the same eigenvalues as the normal operator N. By 
Corollary 1 of Theorem 10.15 we know that A is diagonalizable, and hence 
applying Theorem 10.20 we may write 

A = X-iEj + • • • + A, r E r 

where E ; 2 = E ; , EjEj = if i j, and E 1 + • • • + E r = I. Furthermore, A is 
diagonalized by a unitary matrix P, and as we saw in the proof of Theorem 
10.20, Ej = PPiPt where each P s is a real diagonal matrix. Since each P s is 
clearly Hermitian, this implies that E^ = E ; , and hence each E ; is an orthogo- 
nal projection (Theorem 10.21). 

Now define Jtj G L(V) as that operator whose matrix representation 
relative to the basis {ej} is just E s . From the isomorphism between linear 
transformations and their representations (Theorem 5.13), it should be clear 
that 

N = \n x +■■• + k r n r 

*i = ™i 
Hi 2 = JTj 

Jti Jtj = for i * j 

Jty + • • • + Jt r = 1 . 

Since Jtj 2 = Jtj = jt^, Theorem 10.21 tells us that each Jtj is an orthogonal 
projection of V on the subspace Wj = Im jtj. Since jt, + • • • + Jt r = 1, we see 
that for any v G V we have v = JTiV + • • • + Jt r v so that V = W\ + • • • + W r . To 
show that this sum is direct suppose, for example, that 

W, G w, n (W 2 + • • • + W r ) . 

This means that w, = w 2 + • • • + w r where Wj G W ; for each i = 1, . . . , r. Since 
Wj G Wj = Im jtj, it follows that there exists Vj G V such that jt^ = w ; for each 
i. Then 

Wj = JtjVi = Jtj 2 Vj = JtjWj 

and if i ^ j , then jt^ = implies 
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JtjWj = (jtiJtj)Vj = . 



Applying jti to w x = w 2 + • • • + w r , we obtain w, = j^Wj = 0. Hence we have 
shown that W, D (W 2 + • • • + W r ) = {0}. Since this argument can clearly be 
applied to any of the Wj , we have proved that V = W x © • • • © W r . 

Next we note that for each i, Jtj is the orthogonal projection of V on Wj = 
Im Jtj in the direction of Wi X = Ker Jtj, so that V = W s © Wi X . Therefore, since 
V = W, © • • • © W r , it follows that for each j ^ i we must have Wj C W^, and 
hence the subspaces Wj must be mutually orthogonal. Finally, the fact that 
Wj = Ker(N - X, 1) was proved in Theorem 7.29. I 

The observant reader will have noticed the striking similarity between the 
spectral theorem and Theorem 7.29. In fact, part of Theorem 10.22 is essen- 
tially a corollary of Theorem 7.29. This is because a normal operator is diag- 
onalizable, and hence satisfies the hypotheses of Theorem 7.29. However, 
note that in the present case we have used the existence of an inner product in 
our proof, whereas in Chapter 7, no such structure was assumed to exist. We 
leave it to the reader to use Theorems 10.15 and 7.28 to construct a simple 
proof of the spectral theorem that makes no reference to any matrix represen- 
tation of the normal operator (see Exercise 10.5.1). 

Theorem 10.23 Let 2j=iA.jEj be the spectral decomposition of a normal oper- 
ator N on a finite-dimensional unitary space. Then for each i = 1, . . . , r there 
exists a polynomial fj(x) G C[x] such that fj(Xj) = Sjj and fj(N) = E s . 

Proof For each i = 1, . . . , r we must find a polynomial fj(x) G C[x] with the 
property that fj(Xj) = S^. It should be obvious that the polynomials f ; (x) 
defined by 



have this property. From the corollary to Theorem 10.20 we have p(N) = 
2jP(A.j)Ej for any p(x) G C[x], and hence 




x- A 



f,(N) = Xf,(/.)H = = E, 

as required. I 
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Exercises 

1. Use Theorems 10.15 and 7.28 to construct a proof of Theorem 10.22 that 
makes no reference to any matrix representations. 

2. Let N be an operator on a finite-dimensional unitary space. Prove that N is 
normal if and only if = g(N) for some polynomial g. [Hint: If N is 
normal with eigenvalues X b . . . , A, r , use Exercise 6.4.2 to show the exis- 
tence of a polynomial g such that g(X0 = Xj* for each L] 

3. Let T be an operator on a finite-dimensional unitary space. Prove that T is 
unitary if and only if T is normal and |X| = 1 for every eigenvalue X of T. 

4. Let H be a normal operator on a finite-dimensional unitary space. Prove 
that H is Hermitian if and only if every eigenvalue of H is real. 

10.6 THE MATRIX EXPONENTIAL SERIES 

We now use Theorem 10.20 to prove a very useful result, namely, that any 

unitary matrix U can be written in the form e' H for some Hermitian matrix H. 
Before proving this however, we must first discuss some of the theory of 
sequences and series of matrices. In particular, we must define just what is 

meant by expressions of the form e' H . If the reader already knows something 
about sequences and series of numbers, then the rest of this section should 
present no difficulty. However, for those readers who may need some review, 
we have provided all of the necessary material in Appendix B. 

Let {S r } be a sequence of complex matrices where each S r G M n (C) has 

entries s (r) jj. We say that {S r } converges to the limit S = (s^) G M n (C) if each 
of the n 2 sequences {s^-V,} converges to a limit Sjj. We then write S r — * S or 
lim r _^ oo S r = S (or even simply lim S r = S). In other words, a sequence {S r } of 
matrices converges if and only if every entry of S r forms a convergent 
sequence. 

Similarly, an infinite series of matrices 

00 

where A r = (a (r) jj) is said to be convergent to the sum S = (s^) if the sequence 
of partial sums 

m 
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converges to S. Another way to say this is that the series 2A r converges to S 

if and only if each of the n 2 series 2a (r \i converges to s^ for each i, j = 1 , . . . , 
n. We adhere to the convention of leaving off the limits in a series if they are 
infinite. 

Our next theorem proves several intuitively obvious properties of 
sequences and series of matrices. 

Theorem 10.24 (a) Let {S r } be a convergent sequence of n x n matrices 
with limit S, and let P be any n x n matrix. Then PS r — * PS and S r P — * SP. 

(b) If S r -> S and P is nonsingular, then P l S r P P'SP. 

(c) If 2A r converges to A and P is nonsingular, then 2P~'A r P converges 
to P 'AP. 

Proof (a) Since S r -* S, we have lim s (r) jj = s ;j for all i, j = 1, . . . , n. 
Therefore 

lim(PS r )ij = lim(2 k p ik s (r) kj ) = 2 k p ik iim s (r) kj = 2 k p ik s kj = (PS)^ . 

Since this holds for all i, j = 1, . . . , n we must have PS r -*■ PS. It should be 
obvious that we also have S r P — * SP. 

(b) As in part (a), we have 

]im(p- l S r P)g = \im(Z kmP -\ k s {r \ m p mj ) 

= ^k,mP ikPmj^ msr km 

_ y -1 

— k,mP ikPmj^km 

= (P~ l SP) ij . 

Note that we may use part (a) to formally write this as 

limCP-'SrP) = p-Trm(S r P) = P"'SP . 

(c) If we write the mth partial sum as 

m I m \ 

r=\ \r=l / 

then we have 
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k,i 



kl 

m 



Plj 



= ^P~\kPijK™^a ir \i 

k,l 

= ^P~\kPlj a k 



k,l 

= P~ l AP . I 



Theorem 10.25 For any A = (a^) G M n (C) the following series converges: 



A A A z A r 

y Al = /+a+ — +• ••+—+. 

r-0 r! 21 r! 



Proof Choose a positive real number M > max{n, la^l} where the max is 
taken over all i, j = 1, . . . , n. Then la^l < M and n < M < M 2 . Now consider 
the term A 2 = (b^) = (2 k a ik a kj ). We have (by Theorem 2.17, property (N3)) 

l^l^il%lkyl<2 M2=RM2<M4 • 

k=\ k=\ 

Proceeding by induction, suppose that for A r = (c^), it has been shown that 
Ic.jl < M 2r . Then A r+1 = (dy) where 

\dij\ * 2 I < nMM 2r = nM 2r+i < M 2{r+l) . 



k=\ 



This proves that A r = (a^) has the property that la^l < M 2r for every r > 1. 
Now, for each of the n terms i, j = 1, . . . , n we have 



I (r) I oo • m 2r 



= exp(M 2 ) 



r\ 



r\ 



so that each of these n series (i.e., for each i, j = 1, . . . , n) must converge 
(Theorem B26(a)). Hence the series I + A + A /2! + • • • must converge 
(Theorem B20). I 

We call the series in Theorem 10.25 the matrix exponential series, and 

denote its sum by e A = exp A. In general, the series for e A is extremely diffi- 
cult, if not impossible, to evaluate. However, there are important exceptions. 
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Example 10.7 Let A be the diagonal matrix 



A = 



(ky ••• 0^ 

A 2 ••• 







A 



n) 



Then it is easy to see that 



A r = 



V 

o v 










A, 



■n I 



and hence 



A 

exp A = I + A + — + • 
2! 



l e h 











Example 10.8 Consider the 2 x 2 matrix 



J = 



<0 -V 



and let 

(0 -d^ 



A = dJ = 



where 0GR. Then noting that J 2 = -I, we see that A 2 = -6 2 I, A 3 = -6 3 J, A 4 = 
4 I, A 5 = 6 5 J, A 6 = -6 6 I, and so forth. From elementary calculus we know 
that 

sine = e-e 3 /3! + e 5 /5! — 

and 

cos e = i - e 2 /2! + e 4 /4! — 



and hence 
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e A = I + A + A 2 /l\ + --- 

= I + 6J- d 2 l/2 ! - 6 3 j/3 ! + 6 4 l/4 ! + 6 5 j/5 ! - d 6 l/6 ! + • • • 

= i(i-d 2 /2\+d 4 /4\----)+j(d-d 3 /3i+d 5 /5\----) 

= (cos0)/ + (sin0)/ . 

In other words, using the explicit forms of I and J we see that 

A (cosd -sin0\ 
e = 

^sin0 cos0 j 
so that e ej represents a rotation in R 2 by an angle 0. / 

Theorem 10.26 Let A G M n (C) be diagonalizable, and let k u . . . , X, r be the 
distinct eigenvalues of A. Then the matrix power series 

00 

converges if and only if the series 

00 

converges for each i = 1, . . . , r. 

Proof Since A is diagonalizable, choose a nonsingular matrix P such that D = 
P" 1 AP is diagonal. It is then easy to see that for every s > 1 we have 

a s D s = a s p-'A s P = P'a s A s P 

where the n diagonal entries of D s are just the numbers X ; s . By Theorem 
10.24(c), we know that 2a s A s converges if and only if 2a s D s converges. But 

by definition of series convergence, 2a s D s converges if and only if 2a s A,j S 
converges for every i = 1, . . . , r. I 

Theorem 10.27 Let f(x) = a + ajx + a 2 x 2 + • • • be any power series with 
coefficients in C, and let A G M n (C) be diagonalizable with spectral decom- 
position A = AJE] + • • • + X r E r . Then, if the series 

f(A) = a I + a,A + a 2 A 2 + • • • 

converges, its sum is 
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f(A) = f(X,)E, + • • • + f(X r )E r . 
Proof As in the proof of Theorem 10.20, let the diagonal form of A be 

D = P-'AP = X,P, + • • • + X r P r 
so that Ej = PPiP 1 . Now note that 

P~ l f{A)P = a Q P~ l P + a^AP + a 2 P~ l APP- l AP + ■■■ 
= f(P~ l AP) 
= a I + a x D + a 2 D 2 +■■■ 
= f(D) . 

Using properties (a) - (c) of Theorem 10.20 applied to the P s , it is easy to see 
that D k = X, k P, + • • • + X r k P r and hence 

f(D) = f^P, + • • • + f(X r )P r . 

Then if f(A) = 2A r converges, so does 2P"'A r P = P"'f(A)P = f(D) (Theorem 
10.24(c)), and we have 

f(A) = f(PDP"') = Pf(D)P _I = f(X,)E, + • • • + f(X r )E r . I 

Example 10.9 Consider the exponential series e A where A is diagonalizable. 
Then, if X, l5 . . . , A* are the distinct eigenvalues of A, we have the spectral 

decomposition A = XJB, + • • • + X k E k . Using f(A) = e A , Theorem 10.27 yields 

e A = e Xl E! + • • • + e Xk E k 

in agreement with Example 10.7. / 

We can now prove our earlier assertion that a unitary matrix U can be 
written in the form e l H for some Hermitian matrix H. 

Theorem 10.28 Every unitary matrix U can be written in the form e' H for 
some Hermitian matrix H. Conversely, if H is Hermitian, then e iH is unitary. 

Proof By Theorem 10.9(b), the distinct eigenvalues of U may be written in 
the form e ! \ . . . , e ! ^ k where each X s is real. Since U is also normal, it fol- 
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lows from Corollary 1 of Theorem 10.15 that there exists a unitary matrix P 
such that PtUP = P'UP is diagonal. In fact 

P-'UP = e a 'P, + • • • + e^ k P k 

where the Pj are the idempotent matrices used in the proof of Theorem 10.20. 

From Example 10.7 we see that the matrix e^'Pj + • • • + e ! ^ k P k is just e' D 
where 

D = X,P, + • • • + X k P k 

is a diagonal matrix with the A, as diagonal entries. Therefore, using Theorem 
10.24(c) we see that 

u = Pe ! ' D p- ] = e p D p ~' = e H 

where H = PDP" 1 . Since D is a real diagonal matrix it is clearly Hermitian, and 
since P is unitary (so that P 1 = Pt), it follows that Ht = (PDPt)t = PDPt = H 
so that H is Hermitian also. 

Conversely, suppose H is Hermitian with distinct real eigenvalues X u . . . , 
X k . Since H is also normal, there exists a unitary matrix P that diagonalizes H. 
Then as above, we may write this diagonal matrix as 

P'HP = XiPj + • • • + X k P k 

so that (from Example 10.7 again) 

pV H P = e' p "' HP = e a 'P, + • • • + e ak P k . 

Using the properties of the P h it is easy to see that the right hand side of this 
equation is diagonal and unitary since using 

(e'^-iPj + • • • + e ak P k )t = e^P! + • • • + e" ak P k 

we have 

(e'^iPj + • • • + e ak P k )t(e ai P, + • • • + e^ k P k ) = I 

and 

(e'^ip, + • • • + e ak P k )(e ai P! + • • • + e^ k P k )t = I . 
Therefore the left hand side must also be unitary, and hence (using P" 1 = Pt) 
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/ = (p- l e iH P)\p- l e m P) 
= P f (e iH fPP- 1 e iH P 
= P\e iH ) t e iH P 

so that PP" 1 = I = (e m )te m . Similarly we see that e m (e m )t = I, and thus e iH is 
unitary. I 

While this theorem is also true in infinite dimensions (i.e., in a Hilbert 
space), its proof is considerably more difficult. The reader is referred to the 
books listed in the bibliography for this generalization. 

Given a constant matrix A, we now wish to show that 

^ = Ae tA . (1) 
dt 

To see this, we first define the derivative of a matrix M = M(t) to be that 
matrix whose elements are just the derivatives of the corresponding elements 
of M. In other words, if M(t) = (m ;j (t)), then (dM/dt)^ = dmj/dt. Now note that 
(with M(t) = tA) 

e tA = I + tA + (tA) 2 /2! + (tA) 3 /3! + • • • 

and hence (since the & i} are constant) taking the derivative with respect to t 
yields the desired result: 

de tA ldt = + A + tA 2 + (tA) 2 A/ 21 + • • • 
= A{I + tA + (tA) 2 /2l + ---} 
= Ae tA . 

Next, given two matrices A and B (of compatible sizes), we recall that 
their commutator is the matrix [A, B] = AB - BA = -[B, A]. If [A, B] = 0, 
then AB = BA and we say that A and B commute. Now consider the function 

f(x) = e xA Be" xA . Leaving it to the reader to verify that the product rule for 
derivatives also holds for matrices, we obtain (note that Ae xA = e xA A) 

dfldx = Ae xA Be~ xA - e xA Be~ xA A = Af-fA = [A, /] 
d 2 fldx 2 = [A, df/dx] = [A, [A, /]] 
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Expanding f(x) in a Taylor series about x = 0, we find (using f(0) = B) 

f( x ) = /(0) + ( df/dx) x + (d 2 f/dx 2 ) x 2 12 ! + • • • 
= B + [A, B]x + [A, [A, B]]x 2 /2\ + --- . 

Setting x = 1, we finally obtain 

e A Be~ A =B + [A, B] + [A, [A, B]]/2! + [A, [A, [A, fi]]]/3! + --- (2) 

Note that setting B = I shows that e A e" A = I as we would hope. 

In the particular case that both A and B commute with their commutator 
[A, B], then we find from (2) that e A Be" A = B + [A, B] and hence e A B = 
Be A + [A, B]e A or 

[e A ,B] = [A,B]e A . (3) 

Example 10.10 We now show that if A and B are two matrices that both 
commute with their commutator [A, B], then 

e A e B = exp{A + 5 + [A, B]/2} . (4) 

(This is sometimes referred to as Weyl's formula.) To prove this, we start 
with the function f(x) = e *A e xB e -x(A+B) Then 

dfldx = e xA Ae xB e- x(A+B) + e xA e xB Be~ x{A+B) - e xA e xB (A + B)e~ x{A+B) 

= e xA Ae xB e- x(A+B) - e xA e xB Ae~ x{MB) ( 5 ) 

= e xA [A, e * B ] e - x ( A + B ) 

As a special case, note [A, B] = implies df/dx = so that f is independent of 
x. Since f(0) = I, it follows that we may choose x = 1 to obtain e A e B e" (A+B) = 
I or e A e B = e A+B (as long as [A, B] = 0). 

From (3) we have (replacing A by xB and B by A) [A, e xB ] = x[A, B]e xB . 
Using this along with the fact that A commutes with the commutator [A, B] 
(so that e xA [A, B] = [A, B]e xA ), we have 

df/dx = xe xA [A, B]e xB e" x(A+B) = x[A, B]f . 

Since A and B are independent of x, we may formally integrate this from to 
x to obtain 

lnf(x)/f(0) = [A, B]x 2 /2 . 
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Using f(0) = 1, this is f(x) = exp{[A, B]x 2 /2} so that setting x = 1 we find 

e A e B e -(A+B) _ exp { [A; B ]/2} . 

Finally, multiplying this equation from the right by e A+B and using the fact 
that [[A, B]/2, A + B] = yields (4). / 

Exercises 

1. (a) Let N be a normal operator on a finite-dimensional unitary space. 
Prove that 

dete N = e TrN . 

(b) Prove this holds for any N G M n (C). [Hint: Use either Theorem 8.1 or 
the fact (essentially proved at the end of Section 8.6) that the diagonaliz- 
able matrices are dense in M n (C).] 

2. If the limit of a sequence of unitary operators exists, is it also unitary? 
Why? 

3. Let T be a unitary operator. Show that the sequence {T n : n = 0, 1, 2, . . . } 

contains a subsequence {T nk : k = 0, 1, 2, . . . } that converges to a unitary 

operator. [Hint: You will need the fact that the unit disk in C 2 is compact 
(see Appendix A).] 

10.7 POSITIVE OPERATORS 

Before proving the main result of this section (the polar decomposition 
theorem), let us briefly discuss functions of a linear transformation. We have 
already seen two examples of such a function. First, the exponential series e A 
(which may be defined for operators exactly as for matrices) and second, if A 
is a normal operator with spectral decomposition A = 2XEj, then we saw that 
the linear transformation p(A) was given by p(A) = 2p(A,j)Ej where p(x) is any 
polynomial in C[x] (Corollary to Theorem 10.20). 

In order to generalize this notion, let N be a normal operator on a unitary 
space, and hence N has spectral decomposition 2XE ; . If f is an arbitrary 
complex- valued function (defined at least at each of the >0, we define a linear 
transformation f(N) by 

f(N) = 2f(X i )E i . 
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What we are particularly interested in is the function f(x) = Vx defined for all 
real x > as the positive square root of x. 

Recall (see Section 10.3) that we defined a positive operator P by the 
requirement that P = S^S for some operator S. It is then clear that Pt = P, and 
hence P is normal. From Theorem 10.9(d), the eigenvalues of P = 2XjEj are 
real and non-negative, and we can define VP by 

VP = SjVX^Ej 

where each Xj > 0. 

Using the properties of the Ej, it is easy to see that (VP) 2 = P. Furthermore, 
since Ej is an orthogonal projection, it follows that E^ = Ej (Theorem 10.21), 
and therefore (VP)^ = VP so that VP is Hermitian. Note that since P = S+S we 
have 

(Pv,v) = ((StS)v, v) = (Sv, Sv) = IISvll 2 > . 

Just as we did in the proof of Theorem 10.23, let us write v = 2EjV = 2vj 
where the nonzero Vj are mutually orthogonal. Then 

VP(v) = 2VX~E jV = 2VX~ Vj 

and hence we also have (using (v j; v k ) = if j * k) 

(VP (V), V) = Pj^Vj, X k V k ) = Z,. k ^ (Vj, V k ) = Xj^ (Vj, Vj) 

= 2 ; .V>^llv/>0 . 
In summary, we have shown that VP satisfies 

(a) (VP) 2 = P 

(b) (VP)t =VP 

(c) (VP(v), v) > 

and it is natural to ask about the uniqueness of any operator satisfying these 

three properties. For example, if we let T = 2 ±VX~ Ej, then we still have T 2 = 
2 XjEj = P regardless of the sign chosen for each term. Let us denote the fact 
that VP satisfies properties (b) and (c) above by the expression VP > 0. In 
other words, by the statement A > we mean that At = A and (Av, v) > for 
every v G V (i.e., A is a positive Hermitian operator). 

We now claim that if P = T 2 and T > 0, then T = VP. To prove this, we 
first note that T > implies = T (property (b)), and hence T must also be 
normal. Now let S^iFj be the spectral decomposition of T. Then 
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20,) 2 F, = T 2 = P = SXjEj . 

If Vj ^ is an eigenvector of T corresponding to \i„ then property (c) tells us 
that (using the fact that each ^ is real since T is Hermitian) 

< (Tv„v,) = (HiVi.Vi) = Mvll 2 . 

But II Vj II > 0, and hence > 0. In other words, any operator T > has non- 
negative eigenvalues. Since each ^ is distinct and nonnegative, so is each u,j 2 , 
and hence each [i? must be equal to some \ . Therefore the corresponding F s 
and Ej must be equal (by Theorem 10.22(e)). By suitably numbering the 
eigenvalues, we may write [i? = X i5 and thus ^ = VX~ . This shows that 

T = 2n;Fi = 2vX~E, = VP" 

as claimed. 

We summarize this discussion in the next result which gives us three 
equivalent definitions of a positive transformation. 

Theorem 10.29 Let P be an operator on a unitary space V. Then the follow- 
ing conditions are equivalent: 

(a) P = T 2 for some unique Hermitian operator T > 0. 

(b) P = StS for some operator S. 

(c) Pt = P and (Pv, v) > for every v E V. 

Proof (a)=>(b): If P = T 2 and Tt = T, then P = TT = TtT. 

(b) => (c): If P = StS, then Pt = P and (Pv, v) = (StSv, v) = (Sv, Sv) = 

IISvll 2 >0. 

(c) => (a): Note that property (c) is just our statement that P > 0. Since 
pt = p ; we see that P is normal, and hence we may write P = 5XjEj. Defining 
T = 2vX~Ej , we have Tt = T (since every Ej is Hermitian), and the preceding 
discussion shows that T > is the unique operator with the property that P = 

T 2 . I 

We remark that in the particular case that P is positive definite, then P = 
StS where S is nonsingular. This means that P is also nonsingular. 

Finally, we are in a position to prove the last result of this section, the so- 
called polar decomposition (or factorization) of an operator. While we state 
and prove this theorem in terms of matrices, it should be obvious by now that 
it applies just as well to operators. 
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Theorem 10.30 (Polar Decomposition) If A G M n (C), then there exist 
unique positive Hermitian matrices H l5 H 2 G M n (C) and (not necessarily 
unique) unitary matrices U l5 U 2 G M n (C) such that A = UiH, = H 2 U 2 . More- 
over, H, = (AtA) 1/2 and H 2 = (AAt) 1/2 . In addition, the matrices U l and U 2 
are uniquely determined if and only if A is nonsingular. 

Proof Let X[ 2 , . . . , X n 2 be the eigenvalues of the positive Hermitian matrix 
At A, and assume the X are numbered so that X > for i = 1, . . . , k and X = 
for i = k + 1, . . . , n (see Theorem 10.9(d)). (Note that if A is nonsingular, then 
At A is positive definite and hence k = n.) Applying Corollary 1 of Theorem 
10.15, we let {v 1; . . . , v n } be the corresponding orthonormal eigenvectors of 
At A. For each i = 1, . . . , k we define the vectors w ; = Av/Xi. Then 

(w., Wj ) = (Av ; /A;, Avj/kj) = (v,-, A*AVj)/kiXj 
= (v,-, v j)/. j~ l/.j/. j = dij/.r/AjAj 

so that w l5 . . . , w k are also orthonormal. We now extend these to an ortho- 
normal basis {w 1; . . . , w n } for C n . If we define the columns of the matrices 

V, W G M n (C) by V 1 = Vi and W 1 = w i; then V and W will be unitary by 
Theorem 10.7. 

Defining the Hermitian matrix D G M n (C) by 

D = diag(X„ . . . , X n ) 

it is easy to see that the equations Av s = XjWj may be written in matrix form as 
AV = WD. Using the fact that V and W are unitary, we define = WVt and 
H, = VDVt to obtain 

A = WDVt = (WVt)(VDVt) = UiHj . 

Since det(XI - VDVt) = det(XI - D), we see that H! and D have the same 
nonnegative eigenvalues, and hence H[ is a positive Hermitian matrix. We can 
now apply this result to the matrix At to write At = or A = H^LV = 
H^t. If we define H 2 = H[ and U 2 = LV, then we obtain A = H 2 U 2 as 
desired. 

We now observe that using A = UiHj we may write 



and similarly 



At A = H^tUA = (HO 2 
AAt = H 2 U 2 U 2 tH 2 = (H 2 ) 2 
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so that H! and H 2 are unique even if A is singular. Since U, and U 2 are 
unitary, they are necessarily nonsingular, and hence H, and H 2 are nonsingular 
if A = = H 2 U 2 is nonsingular. In this case, U, = AH^ 1 and U 2 = H 2 "'A 
will also be unique. On the other hand, suppose A is singular. Then k ^ n and 
w k , . . . , w n are not unique. This means that L^ = WV"f (and similarly U 2 ) is 
not unique. In other words, if U t and U 2 are unique, then A must be non- 
singular. I 

Exercises 

1. Let V be a unitary space and let E G L(V) be an orthogonal projection. 

(a) Show directly that E is a positive transformation. 

(b) Show that II Evil < II v II for all v E V. 

2. Prove that if A and B are commuting positive transformations, then AB is 
also positive. 

3. This exercise is related to Exercise 7.5.5. Prove that any representation of 
a finite group is equivalent to a unitary representation as follows: 

(a) Consider the matrix X = 2 a eG D^(a)D(a). Show that X is Hermitian 

and positive definite, and hence that X = S for some Hermitian S. 

(b) Show that D(a)tXD(a) = X. 

(c) Show that U(a) = SD(a)S _1 is a unitary representation. 
Supplementary Exercises for Chapter 10 

1. Let T be a linear transformation on a space V with basis {e^ . . . , e n }. If 
T(ej) = 2j >iajiej for all i = 1, . . . , n and T(ei) * cei for any scalar c, show 
that T is not normal. 

2. Let A be a fixed n x n matrix, and let B be any n x n matrix such that A = 
B . Assume that B is similar to a diagonal matrix and has nonnegative 
eigenvalues k\, . . . , K n . Let p(x) be a polynomial such that p(X ; 2 ) = Xj for 
each i = 1 , . . . , n. Show that p(A) = B and hence B is unique. How does 
this relate to our discussion of VP" for a positive operator P? 

3. Describe all operators that are both unitary and positive. 

4. Is it true that for any A E M n (C), AAt and A^A are unitarily similar? 
Explain. 
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5. In each case, indicate whether or not the statement is true or false and 
give your reason. 

(a) For any A G M n (C), AAt has all real eigenvalues. 

(b) For any A G M n (C), the eigenvalues of AAt are of the form |X| 2 
where X is an eigenvalue of A. 

(c) For any A G M n (C), the eigenvalues of A At are nonnegative real 
numbers. 

(d) For any A G M n (C), AAt has the same eigenvalues as At A if A is 
nonsingular. 

(e) For any A G M n (C), Tr(AAt) = |Tr A| 2 . 

(f ) For any A G M n (C), AAt is unitarily similar to a diagonal matrix. 

(g) For any A G M n (C), AAt has n linearly independent eigenvectors. 

(h) For any A G M n (C), the eigenvalues of AAt are the same as the 
eigenvalues of At A. 

(i) For any A G M n (C), the Jordan form of AAt i s the same as the Jordan 
form of At A. 

(j) For any A G M n (C), the null space of At A is the same as the null 
space of A. 

6. Let S and T be normal operators on V. Show that there are bases {uj and 
{vj for V such that [S] u = [T] v if and only if there are orthonormal bases 
{u';} and {v'i} such that [S] u ' = [T] v '. 

7. Let T be normal and let k > be an integer. Show that there is a normal S 
such that S k = T. 

8. Let N be normal and let p(x) be a polynomial over C. Show that p(N) is 
also normal. 

9. Let N be a normal operator on a unitary space V, let W = Ker N, and let 
N be the transformation induced by N on V/W. Show that N is normal. 
Show that N" 1 is also normal. 

10. Discuss the following assertion: For any linear transformation T on a 
unitary space V, TTt and TtT have a common basis of eigenvectors. 

11. Show that if A and B are real symmetric matrices and A is positive defi- 
nite, then p(x) = det(B - xA) has all real roots. 
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Multilinear Mappings and 
Tensors 



In this chapter we generalize our earlier discussion of bilinear forms, which 
leads in a natural manner to the concepts of tensors and tensor products. While 
we are aware that our approach is not the most general possible (which is to 
say it is not the most abstract), we feel that it is more intuitive, and hence the 
best way to approach the subject for the first time. In fact, our treatment is 
essentially all that is ever needed by physicists, engineers and applied 
mathematicians. More general treatments are discussed in advanced courses 
on abstract algebra. 

The basic idea is as follows. Given a vector space V with basis {ej}, we 

defined the dual space V* (with basis {co 1 }) as the space of linear functionals 
on V. In other words, if cj> = Si^oo 1 E V* and v = 2jV J ej G V, then 

0(v) = (0, v) = (Z^W. = Z^-v-tf,. 

= W • 

Next we defined the space ®(V) of all bilinear forms on V (i.e., bilinear map- 
pings on V x V), and we showed (Theorem 9.10) that ®(V) has a basis given 
by {f 1J = co 1 ® o>>} where 

fj(u, v) = co 1 ® o>i(u, v) = G) i (u)a> j (v) = uV . 
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It is this definition of the f 1J that we will now generalize to include linear func- 
tionals on spaces such as, for example, V* x V* x V* x V x V. 

11.1 DEFINITIONS 

Let V be a finite-dimensional vector space over J, and let V r denote the r-fold 

Cartesian product V x V x • • • x V. In other words, an element of V r is an r- 
tuple (vi, . . . , v r ) where each Vj G V. If W is another vector space over J, 

then a mapping T: V r -* W is said to be multilinear if T(v l5 . . . , v r ) is linear 
in each variable. That is, T is multilinear if for each i = 1, . . . , r we have 

T(vi, . . . , av; + bv'i , . . . , v r ) 

= aT(vi, . ... .... v r ) + bT(vi, . . . , v';, . . . , v r ) 

for all Vj, v'j G V and a,bGf. In the particular case that W = J, the mapping 
T is variously called an r-linear form on V, or a multilinear form of degree 

r on V, or an r-tensor on V. The set of all r-tensors on V will be denoted by 
T r (V). (It is also possible to discuss multilinear mappings that take their 
values in W rather than in ( f. See Section 1 1.5.) 

As might be expected, we define addition and scalar multiplication on 
Tr(V) by 

(S + TXvi, ...,v r )-S(y lt ...,v r ) + T(y u ...,v r ) 
{aT)(v x , ... , v r ) = aT(v l , ... , v r ) 

for all S, T G % (V) and a G J. It should be clear that S + T and aT are both r- 
tensors. With these operations, % (V) becomes a vector space over ( f. Note 
that the particular case of r = 1 yields T\ (V) = V*, i.e., the dual space of V, 
and if r = 2, then we obtain a bilinear form on V. 

Although this definition takes care of most of what we will need in this 
chapter, it is worth going through a more general (but not really more 
difficult) definition as follows. The basic idea is that a tensor is a scalar- 
valued multilinear function with variables in both V and V*. Note also that by 
Theorem 9.4, the space of linear functions on V* is V** which we view as 
simply V. For example, a tensor could be a function on the space V* x V x V. 
By convention, we will always write all V* variables before all V variables, 
so that, for example, a tensor on V x V* x V will be replaced by a tensor on 
V* x V x V. (However, not all authors adhere to this convention, so the reader 
should be very careful when reading the literature.) 
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Without further ado, we define a tensor T on V to be a multilinear map on 

V* s x V r : 

T: V* s xV r = V* x---xV* xVx---xV -> f 

\ j \ j 

s copies r copies 

where r is called the covariant order and s is called the contravariant order 

of T. We shall say that a tensor of covariant order r and contravariant order s 
is of type (or rank) (r). If we denote the set of all tensors of type (f) by T r s (V), 
then defining addition and scalar multiplication exactly as above, we see that 
1^ S (V) forms a vector space over f. A tensor of type (o) is defined to be a 
scalar, and hence 1§(V) = A tensor of type (d) is called a contravariant 
vector, and a tensor of type (?) is called a covariant vector (or simply a 
covector). In order to distinguish between these types of vectors, we denote 
the basis vectors for V by a subscript (e.g., e s ), and the basis vectors for V* by 

a superscript (e.g., ccP). Furthermore, we will generally leave off the V and 
simply write T r or Tf. 

At this point we are virtually forced to introduce the so-called Einstein 
summation convention. This convention says that we are to sum over 
repeated indices in any vector or tensor expression where one index is a 
superscript and one is a subscript. Because of this, we write the vector com- 
ponents with indices in the opposite position from that of the basis vectors. 
This is why we have been writing v = SjV'ej G V and (|) = Sj^ooJ G V*. Thus 

we now simply write v = v'ej and cj> = cj>j coJ where the summation is to be 
understood. Generally the limits of the sum will be clear. However, we will 
revert to the more complete notation if there is any possibility of ambiguity. 

It is also worth emphasizing the trivial fact that the indices summed over 
are just "dummy indices." In other words, we have v'e; = v J ej and so on. 
Throughout this chapter we will be relabelling indices in this manner without 
further notice, and we will assume that the reader understands what we are 
doing. 

Suppose T G %, and let {e^ . . . , e n } be a basis for V. For each i = 1, . . . , 
r we define a vector V; = eja J i where, as usual, a J ; G J is just the jth component 
of the vector v s . (Note that here the subscript i is not a tensor index.) Using the 
multilinearity of T we see that 

T(v„ . . . , v r ) = T(e jl aj' 1 , . . . , e jr a\) = a^ • • • a\ T( ejl , . . . , e jr ) . 

The n r scalars T(ej, , . . . , ej r ) are called the components of T relative to the 
basis {ej}, and are denoted by Tj, ... j r . This terminology implies that there 
exists a basis for % such that the Tj, ... j r are just the components of T with 
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respect to this basis. We now construct this basis, which will prove that % is 
of dimension n r . 

(We will show formally in Section 11.10 that the Kronecker symbols 6'j 
are in fact the components of a tensor, and that these components are the same 
in any coordinate system. However, for all practical purposes we continue to 

use the S'j simply as a notational device, and hence we place no importance on 
the position of the indices, i.e., S'j = Sj 1 etc.) 

For each collection . . . , i r } (where 1 < i k < n), we define the tensor 
Q h lr (not simply the components of a tensor Q) to be that element of % 
whose values on the basis {ej for V are given by 

■ ■ ■ ^(ej, , . . . , ej.) = &V"»S 

and whose values on an arbitrary collection {v b . . . , v r } of vectors are given 
by multilinearity as 

Q* v " ir (v l , ... , v r ) = Q h ■'■■(e j a j \, ... , ej a j \) 
= aV--a\Q' 1 "' v (e A , ...,e jr ) 
= a jl 1 ---a jr r d h : 

1 r 3\ Jr 

= a h l ---a l \ . 

That this does indeed define a tensor is guaranteed by this last equation which 
shows that each Q 1 ' lr is in fact linear in each variable (since V[ + w\ = 
(a jl i + a' jl i)ej, etc.). To prove that the n r tensors Q 1 ' ' ' ' lr form a basis for %, 
we must show that they linearly independent and span %. 

Suppose that . . . h Q 1 ' ' ' lr = where each a;, . . . ; r G ( f. From the 

definition of Q 1 ' ' ' ' lr , we see that applying this to any r-tuple (ej, , . . . , ej r ) of 
basis vectors yields otj, . . . j r = 0. Since this is true for every such r-tuple, it 
follows that (Xij . . . i r = for every r-tuple of indices . . . , i r ), and hence the 

Q 1 ' " lr are linearly independent. 

Now let Tij . . . i r = T(e;, , . . . , ei r ) and consider the tensor 

Ti • • • i. ^'••• ir 

in %. Using the definition of Q 1 ' ' ' ' lr , we see that both T;, . . . ^Q 1 ' ' ' ' lr and T 
yield the same result when applied to any r-tuple (ej, , . . . , ej r ) of basis 
vectors, and hence they must be equal as multilinear functions on V r . This 
shows that {Q 1 ' ' ' ' lr } spans %. 
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While we have treated only the space %, it is not any more difficult to 
treat the general space T r s . Thus, if {ej is a basis for V, {o>>} is a basis for V* 
and T £ T r s , we define the components of T (relative to the given bases) by 

T 1 '-- V-jr = T(a) i i,...,a)Ke jl ,...,e jl ) . 

Defining the n r+s analogous tensors Q^V.".'^ 1 , it is easy to mimic the above 
procedure and hence prove the following result. 

Theorem 11.1 The set T r s of all tensors of type (?) on V forms a vector space 
of dimension n r+s . 

Proof This is Exercise 11.1.1. I 

Since a tensor T £ Tf is a function on V* s x V r , it would be nice if we 
could write a basis (e.g., Qjj 1 .'.'.';^) for in terms of the bases {ej for V and 

{co 1 } for V*. We now show that this is easy to accomplish by defining a 
product on CT r s , called the tensor product. The reader is cautioned not to be 
intimidated by the notational complexities, since the concepts involved are 
really quite simple. 

Suppose that S £ T T ^ and T £ T^ 2 . Let u b . . . , u r „ v b . . . , v r2 be 

vectors in V, and a 1 , . . . , a S| , (3 1 , . . . , |3 S2 be covectors in V*. Note that the 
product 

S(a ! , . . . , a s >, u„ . . . , u ri ) T(P ! , . . . , |3 S ?, v„ . . . , v r2 ) 

is linear in each of its x x + s, + r 2 + s 2 variables. Hence we define the tensor 
product S T £ T£+£ (read "S tensor T") by 

(S ® T)(a ! , . . . , a s >, p 1 , . . . , (3 S2 , u„ . . . , u n , v„ . . . , v r2 ) = 

S(a ! , . . . , a 8 ', u„ . . . , u ri ) T(P ! , . . . , ^\ v„ . . . , v r2 ) . 

It is easily shown that the tensor product is both associative and distribu- 
tive (i.e., bilinear in both factors). In other words, for any scalar a £ J and 
tensors R, S and T such that the following formulas make sense, we have 

(R ® S) ® T = R ® (S ® T) 
R®(S + T) = R®S + R®T 
(R + S)®T = R®T + S®T 
(aS)®T = S®(aT) = a(S®T) 
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(see Exercise 11.1.2). Because of the associativity property (which is a conse- 
quence of associativity in f), we will drop the parentheses in expressions such 
as the top equation and simply write R ® S ® T. This clearly extends to any 
finite product of tensors. It is important to note, however, that the tensor 
product is most certainly not commutative. 

Now let {e u . . . , e n } be a basis for V, and let {o>>} be its dual basis. We 

claim that the set {co 11 ® • • • ® co^} of tensor products where 1 < j k < n forms 
a basis for the space T r of covariant tensors. To see this, we note that from the 
definitions of tensor product and dual space, we have 

o>ii ® • • • ® ooHei, , . . • , e ir ) = (^(ei,) ■ ■ ■ = • • • ti% 

so that co 11 ® • • • ® oo^ and C2 J| Jr take the same values on the r-tuples 
(e;,, . . . , e; r ), and hence they must be equal as multilinear functions on V r . 
Since we showed above that {Q h Jr } forms a basis for % , we have proved 
that {otP 1 ® • • • ® o> ,r } also forms a basis for T r . 

The method of the previous paragraph is readily extended to the space T r s . 
We must recall however, that we are treating V** and V as the same space. If 
{e ; } is a basis for V, then the dual basis {ccP} for V* was defined by (^(e,) = 
(toJ, ej } = 6 J j. Similarly, given a basis {o>>} for V*, we define the basis {e ; } for 

V** = V by e i (co J ) = oo^ej) = S J j. In fact, using tensor products, it is now easy 
to repeat Theorem 11.1 in its most useful form. Note also that the next 
theorem shows that a tensor is determined by its values on the bases {ej} and 
{o>i}. 

Theorem 11.2 Let V have basis {e^ . . . , e n }, and let V* have the corre- 
sponding dual basis {go 1 , . . . , a)"}. Then a basis for T r s is given by the collec- 
tion 

{ei, ® • • • ® e is ® ocP 1 ® • • • ® o>i r } 
where 1 < j b . . . , j r , i l5 . . . , i s < n, and hence dim T r s = n r+s . 
Proof In view of Theorem 1 1. 1, all that is needed is to show that 

e;, ® • • • ® e is ® co* 1 ® • • • ® a* = Qil 1 .'.-.'^ . 
The details are left to the reader (see Exercise 11.1.1). I 

Since the components of a tensor T are defined with respect to a particular 
basis (and dual basis), we might ask about the relationship between the com- 
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ponents of T relative to two different bases. Using the multilinearity of 
tensors, this is a simple problem to solve. 

First, let {ej be a basis for V and let {co J } be its dual basis. If {B s } is 
another basis for V, then there exists a nonsingular transition matrix A = (a J j) 
such that 

Bi = ejaJ; . (1) 

(We emphasize that a J j is only a matrix, not a tensor. Note also that our defini- 
tion of the matrix of a linear transformation given in Section 5.3 shows that a J j 
is the element of A in the jth row and ith column.) Using (oo 1 , ej) = b l p we have 

(co 1 , e k ) = (co 1 , Cj a j k } = a i k (o) 1 , = a\b) = a\ . 

Let us denote the inverse of the matrix A = (a'j) by A" 1 = B = (b'j). In other 

words, a'jbJk = b\ and b^a^ = b\. Multiplying (co 1 , e k > = a 1 ,, by b>j and 
summing on i yields 

(bJ.coUk) = bWk = b\ . 

But the basis {co 1 } dual to {B s } also must satisfy (co J , Bk) = 6 J k , and hence 
comparing this with the previous equation shows that the dual basis vectors 
transform as 

coJ = b^co 1 (2) 

The reader should compare this carefully with (1). We say that the dual 
basis vectors transform oppositely (i.e., use the inverse transformation matrix) 
to the basis vectors. It is also worth emphasizing that if the nonsingular transi- 
tion matrix from the basis {e ; } to the basis {Bj} is given by A, then (according 
to the same convention given in Section 5.4) the corresponding nonsingular 

transition matrix from the basis {co 1 } to the basis {to 1 } is given by B T = 
(A"') T . We leave it to the reader to write out equations (1) and (2) in matrix 
notation to show that this is true (see Exercise 11.1.3). 

We now return to the question of the relationship between the components 
of a tensor in two different bases. For definiteness, we will consider a tensor 
T G T[ 2 . The analogous result for an arbitrary tensor in 1* will be quite 

obvious. Let {e s } and {co 1 } be a basis and dual basis for V and V* respective- 
ly. Now consider another pair of bases {B s } and {co J } where Bj = e^ and to 1 = 
b^co 1 . Then we have T 1J k = T(co\ co>, ek) as well as T pq r = T(co p , to q , B r ), and 
therefore 
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TPi r = T(tBP, m\e r ) = bPibija^T^^ek) = b^b^ a \T\ . 

This is the classical law of transformation of the components of a tensor of 
type (i). It should be kept in mind that (a and (b j) are inverse matrices to 
each other. (In fact, this equation is frequently taken as the definition of a 
tensor (at least in older texts). In other words, according to this approach, any 
quantity with this transformation property is defined to be a tensor.) 
In particular, the components v 1 of a vector v = v 1 e s transform as 

v 1 = b^ 

while the components a s of a covector a = a i (o 1 transform as 

oii = aja J i . 

We leave it to the reader to verify that these transformation laws lead to the 

self-consistent formulas v = v'ej = v J ej and a = aid) 1 = ajC0 J as we should 
expect (see Exercise 11.1.4). 

We point out that these transformation laws are the origin of the terms 
"contravariant" and "covariant." This is because the components of a vector 
transform oppositely ("contravariant") to the basis vectors e;, while the com- 
ponents of dual vectors transform the same as ("covariant") these basis vec- 
tors. 

It is also worth mentioning that many authors use a prime (or some other 
method such as a different type of letter) for distinguishing different bases. In 
other words, if we have a basis {ej} and we wish to transform to another basis 
which we denote by {e;'}, then this is accomplished by a transformation 

matrix (a'j') so that e;' = e^h'- In this case, we would write oa 1 = a'jO*' where 
(a'j) is the inverse of (a'j'). In this notation, the transformation law for the 
tensor T used above would be written as 

TP'i' r ' = bP' i bi' j a k r 'T i J k . 

Note that specifying the components of a tensor with respect to one coor- 
dinate system allows the determination of its components with respect to any 
other coordinate system. Because of this, we shall frequently refer to a tensor 
by its "generic" components. In other words, we will refer to e.g., T\, as a 
"tensor" and not the more accurate description as the "components of the 
tensor T." 
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Example 11.1 For those readers who may have seen a classical treatment of 
tensors and have had a course in advanced calculus, we will now show how 
our more modern approach agrees with the classical. 

If {x 1 } is a local coordinate system on a differentiable manifold X, then a 
(tangent) vector field v(x) on X is defined as the derivative function v = 
vXd/dx 1 ), so that v(f ) = vXdf/dx 1 ) for every smooth function f: X — > R (and 

where each v 1 is a function of position x G X, i.e., v 1 = v'(x)). Since every 
vector at x G X can in this manner be written as a linear combination of the 
d/dx 1 , we see that {d/dx 1 } forms a basis for the tangent space at x. 

We now define the differential df of a function by df(v) = v(f) and thus 
df(v) is just the directional derivative of f in the direction of v. Note that 

dx ; (v) = v(x ; ) = vJ(dxVdxJ) = vWj = v 1 

and hence df(v) = vXdf/dx 1 ) = (df/dx^dx^v). Since v was arbitrary, we obtain 
the familiar elementary formula df = (df/dx^dx 1 . Furthermore, we see that 

dxXd/dxJ) = dxVdxJ = 5* 

so that {dx 1 } forms the basis dual to {d/dx 1 }. In summary then, relative to the 

local coordinate system {x 1 }, we define a basis {e ; = d/dx 1 } for a (tangent) 

space V along with the dual basis {o>> = dx J } for the (cotangent) space V*. 

If we now go to a new coordinate system {x 1 } in the same coordinate 
patch, then from calculus we obtain 

d/dx 1 = (dxj/dx^d/dxj 

so that the expression Bj = e^ implies a\ = dx 3 /dx\ Similarly, we also have 

dx 1 = (dxVdxJ)dxJ 

so that w 1 = b'jOoJ implies b'j = 3x73x J . Note that the chain rule from calculus 
shows us that 

a^b* = (dxVdx k )(dx k /dxj) = dxVdxJ = 5* 

and thus (b'j) is indeed the inverse matrix to (a'j). 

Using these results in the above expression for T pq r , we see that 
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- pq _ dx p dx q dx k ij 

r ~ : : — k 

dx l dx J dx r 

which is just the classical definition of the transformation law for a tensor of 
type (?). 

We also remark that in older texts, a contravariant vector is defined to 

have the same transformation properties as the expression dx 1 = (3x73x J )dx J , 
while a covariant vector is defined to have the same transformation properties 

as the expression d/dx 1 = (3x J /3x 1 )3/3x J . / 

Finally, let us define a simple classical tensor operation that is frequently 
quite useful. To begin with, we have seen that the result of operating on a 
vector v = v'ej G V with a dual vector a = G V* is just (a, v) = 

a j v 1 (a) J , ej) = d^v'S^ = otjV 1 . This is sometimes called the contraction of a with 
v. We leave it to the reader to show that the contraction is independent of the 
particular coordinate system used (see Exercise 11.1.5). 

If we start with tensors of higher order, then we can perform the same sort 

of operation. For example, if we have S G T2 1 with components S'jk and T G 
T 2 with components T pq , then we can form the (1) tensor with components 
S 1 jk TJ q , or a different (1) tensor with components S^T^ and so forth. This 
operation is also called contraction. Note that if we start with a (J) tensor T, 

then we can contract the components of T to obtain the scalar TV This is 
called the trace of T. 

Exercises 

1 . (a) Prove Theorem 11.1. 
(b) Prove Theorem 11.2. 

2. Prove the four associative and distributive properties of the tensor product 
given in the text following Theorem 11.1. 

3. If the nonsingular transition matrix from a basis {ej} to a basis {Bj} is 
given by A = (a'j), show that the transition matrix from the corresponding 
dual bases {00 1 } and {To 1 } is given by (A"') T . 

4. Using the transformation matrices (a'j) and (b'j) for the bases {ej} and {Bj} 
and the corresponding dual bases {co 1 } and {oo 1 }, verify that v = v'ej = v J gj 
and a = 0^00' = a j oo J . 
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5. If v G V and a G V*, show that (a, v) is independent of the particular 
basis chosen for V. Generalize this to arbitrary tensors. 

6. Let A; be a covariant vector field (i.e., A; = A;(x)) with the transformation 



Show that the quantity 3jA; = 3A;/3x J does not define a tensor, but that 
Fij = djAj - 3jA; is in fact a second-rank tensor. 

11.2 SPECIAL TYPES OF TENSORS 

In order to obtain some of the most useful results concerning tensors, we turn 
our attention to the space % of covariant tensors on V. Generalizing our 
earlier definition for bilinear forms, we say that a tensor S G % is symmetric 
if for each pair (i, j) with 1 < i, j < r and all v s G V we have 

S(vi, . ... Vi, .... Vj, .... v r ) = S(vi, . ... Vj, .... Vi, .... v r ) . 

Similarly, A G % is said to be antisymmetric (or skew-symmetric or alter- 
nating) if 

A(V 1; . ... Vi, .... Vj, .... V r ) = -A(Vi, . ... V,, .... Vi, .... V r ) . 

Note this definition implies that A(v x , . . . , v r ) = if any two of the V; are 
identical. In fact, this was the original definition of an alternating bilinear 
form. Furthermore, we also see that A(v l5 . . . , v r ) = if any Vj is a linear 
combination of the rest of the Vj. In particular, this means that we must always 
have r < dim V if we are to have a nonzero antisymmetric tensor of type (?) on 



It is easy to see that if S,, S 2 G T r are symmetric, then so is aS! + bS 2 
where a,b£f. Similarly, aA[ + bA 2 is antisymmetric. Therefore the symmet- 
ric tensors form a sub space of T r which we denote by 2 r (V), and the anti- 
symmetric tensors form another subspace of % which is denoted by A r (V) 

(some authors denote this space by A r (V*)). Elements of A r (V) are generally 
called exterior r- forms, or simply r- forms. According to this terminology, 

the basis vectors {co 1 } for V* are referred to as basis 1-forms. Note that the 
only element common to both of these subspaces is the zero tensor. 



rule 



A = 




v. 
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A particularly important example of an antisymmetric tensor is the 
determinant function det G T n (R n ) (see Theorem 4.9 and the discussion pre- 
ceding it). Note also that the definition of a symmetric tensor translates into 
the obvious requirement that (e.g., in the particular case of CZ2) = Sjj, while 
an antisymmetric tensor obeys = -A j; . These definitions can also be 
extended to include contravariant tensors, although we shall have little need to 
do so. 

It will be extremely convenient for us to now incorporate the treatment of 
permutation groups given in Section 1.2. In terms of any permutation a G S r , 
we may rephrase the above definitions as follows. We say that S G % is sym- 
metric if for every collection v l5 . . . , v r G V and each a G S r we have 

S(v b . . . , v r ) = S(v i , . . . , v ar ) . 

Similarly, we say that A G % is antisymmetric (or alternating) if either 

A(v„ . . . , v r ) = (sgn a)A(v al , . . . , v ar ) 

or 

A(v a i , . . . , v ar ) = (sgn a)A(Vi, . . . , v r ) 

where the last equation follows from the first since (sgn a) = 1. Note that 
even if S, T G 2 r (V) are both symmetric, it need not be true that S ® T be 
symmetric (i.e., S ® T ^ 2 r+r (V)). For example, if S ;j = % and T pq = T qp , it 
does not necessarily follow that SijTpq = Si p Tj q . It is also clear that if A, B G 
A r (V), then we do not necessarily have A ® B G A r+r (V). 

Example 11.2 Suppose a G A n (V), let {e 1; . . . , e n } be a basis for V, and for 
each i = 1, . . . , n let Vj = eja J i where a J i G J. Then, using the multilinearity of 
a, we may write 

a(v„ . . . , v n ) = a jl ! • • • a jn n a(e jl , . . . , e jn ) 

where the sums are over all 1 < j k < n. But a G A n (V) is antisymmetric, and 
hence (ej, , . . . , ej n ) must be a permutation of (e l5 . . . , e n ) in order that the ej k 
all be distinct (or else a(ej, , . . . , ej n ) = 0). This means that we are left with 

a(v ls . . . , v n ) = 2ja jl i • • • a jn n a(ej, , . . . , e jn ) 

where 2j denotes the fact that we are summing over only those values of j k 
such that (jj, . . . , j n ) is a permutation of (1, ... , n). In other words, we have 
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a(v b . . . , v n ) = 2 a es n a°i • • • a on n a(e a i , . . . , e an ) . 

But now, by the antisymmetry of a, we see that a(e a i , . . . , e an ) = 
(sgn a)a(e b . . . , e n ) and hence we are left with 

a(v b . . . , v n ) = 2 es n (sgn a) a a \ ■ ■ ■ a on n a(ei , . . . , e n ) . (*) 

Using the definition of determinant and the fact that a(e l5 . . . , e n ) is just some 
scalar, we finally obtain 

a(v„ . . . , v n ) = det(a j i) a(e u . . . , e n ) . 

Referring back to Theorem 4.9, let us consider the special case where 
oc(e,, . . . , e n ) = 1. Note that if {co 1 } is a basis for V*, then 

(o a j( Vl ) = a) aj (e k a k i) = a k i(o a j(e k ) = a k iS aj k = a a \ . 

Using the definition of tensor product, we can therefore write (*) as 

det(aJi) = a(v„ . . . , v„) = 2 a es n (sgn a)w al ® • • • a) an (v b . . . , v„) 

which implies that the determinant function is given by 

a = 2aes n (sgn a)w o1 ® • • • ® w an . 

In other words, if A is a matrix with columns given by v,, . . . , v n then det A = 
a(v u . . . , v n ). 

While we went through many detailed manipulations in arriving at these 
equations, we will assume from now on that the reader understands what was 
done in this example, and henceforth leave out some of the intermediate steps 
in such calculations. / 

At the risk of boring some readers, let us very briefly review the meaning 
of the binomial coefficient (?) = n!/[r!(n - r)!]. The idea is that we want to 
know the number of ways of picking r distinct objects out of a collection of n 
distinct objects. In other words, how many combinations of n things taken r at 
a time are there? Well, to pick r objects, we have n choices for the first, then 
n - 1 choices for the second, and so on down ton-(r- l) = n- r+ 1 choices 
for the rth . This gives us 



n(n- 1) • • • (n-r+ 1) = n!/(n-r)! 
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as the number of ways of picking r objects out of n if we take into account the 
order in which the r objects are chosen. In other words, this is the number of 
injections INJ(r, n) (see Section 4.6). For example, to pick three numbers in 
order out of the set {1, 2, 3, 4}, we might choose (1,3, 4), or we could choose 
(3, 1, 4). It is this kind of situation that we must take into account. But for 
each distinct collection of r objects, there are r! ways of arranging these, and 
hence we have over-counted each collection by a factor of r!. Dividing by r! 
then yields the desired result. 

If {e l5 . . . , e n } is a basis for V and T G A r (V), then T is determined by its 
values T(ei, , . . . , ei r ) for i x < ■ ■ ■ < i r . Indeed, following the same procedure 

as in Example 1 1.2, we see that if Vj = eja J i for i = 1, . . . , r then 

T(v!, . . . , v r ) = a 11 ! • • • a lr r T(ei, , . . . , e ir ) 

where each sum is over 1 < i k < n. Furthermore, each collection {e;, , . . . , ei r } 
must consist of distinct basis vectors in order that T(e;, , . . . , ei r ) ^ 0. But the 
antisymmetry of T tells us that for any a G S r , we must have 

T(e oi] , . . . , e air ) = (sgn a)T(ei, , . . . , e ir ) 

where we may choose i x < • • • < i r . Thus, since the number of ways of choos- 
ing r distinct basis vectors {e;, , . . . , ei r } out of the basis {e u . . . , e n } is (?), it 
follows that 

dimA r (V) = (?) = n!/[r!(n-r)!] . 

We will prove this result again when we construct a specific basis for A r (V) 
(see Theorem 11.8 below). 

In order to define linear transformations on % that preserve symmetry (or 
antisymmetry), we define the symmetrizing mapping J>: T r — >% and alter- 
nation mapping^: % ^% by 

CST)( Vl , . . . , v r ) = (l/r!)2aes r T(v al , . . . , v ar ) 

and 

C#T)(v b . . . , v r ) = (l/r!)2aes r (sgn a)T(v al , . . . , v ar ) 

where T G % (V) and v l5 . . . , v r G V. That these are in fact linear transforma- 
tions on % follows from the observation that the mapping T a defined by 

T a (v b . . . , v r ) = T(v i , . . . , v ar ) 

is linear, and any linear combination of such mappings is again a linear trans- 
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formation. 

Given any a G S r , it will be convenient for us to define the mapping 
a: V r V r by 

6(V U . . . , V r ) = (Vol , • • • , v ar ) . 

This mapping permutes the order of the vectors in its argument, not the labels 
(i.e. the indices), and hence its argument must always be (vi, V2, . . . , v r ) or 
(wi, W2, • • • , w r ) and so forth. Then for any TGT r (V) we define aT G % (V) 
by 

oT = T o 6 

which is the mapping T a defined above. It should be clear that a(T] + T 2 ) = 
aT, + aT 2 . Note also that if we write 

6(V U . . . , V r ) = (V i , • • • , Vor) = (w„ . . . , W r ) 

then Wj = Voi and therefore for any other x G S r we have 

x o o(vi, . . . , v r ) = t(wi, . . . , w r ) 
= (w T i, . . . , w Tr ) 

= (Voxi, . . . , Voxr) 
= 0»X (Vi, . . . , V r ) . 



This shows that 
and hence 

a(tT) = a(T o t) = T o (t o a) = T ° (a^x) = (a ° x)T . 
Note also that in this notation, the alternation mapping is defined as 

m = (l/r!)2oes r (sgna)(aT) . 

Theorem 11.3 The linear mappings A and S have the following properties: 

(a) T G A r (V) if and only if JIT = T, and T G 2 r (V) if and only if ST = T. 

(b) J*Cr r (V)) = A r (V) and5(7 r (V)) = 2 r (V). 

(c) Jl 2 =Jl and 5 2 = 5, i.e., J3L and 5 are projections. 



Proof Since the mapping J3L is more useful, we will prove the theorem only 
for this case, and leave the analogous results for S to the reader (see Exercise 
11.2.1). Furthermore, all three statements of the theorem are interrelated, so 
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we prove them together. 

First suppose that T G A r (V). From the definition of antisymmetric tensor 
we have T(v D i , . . . , v ar ) = (sgn a)T(v b . . . , v r ), and thus using the fact that 
the order of S r is r!, we see that 



This shows that T E A r (V) implies AT = T. 

Next, let T be any element of T r (V). We may fix any particular element 
G S r and then apply AT to the vectors vei , . . . , ve r to obtain 



Now note that sgn a = (sgn a)(sgn 0)(sgn 0) = (sgn a0)(sgn 0), and that S r = 
{(|) = a0: a G S r } (this is essentially what was done in Theorem 4.3). We now 
see that the right hand side of the above equation is just 

(l/r!)(sgn0)2 aes (sgno0)r(v oei , ... , v adr ) 

= (l/r!)(sgn0)2 0e5 (sgn0)r(v^ ... , V ) 
-(sgn0)jCT(v 1 ,...,v r ) 

which shows (by definition) that AT is antisymmetric. In other words, this 
shows that JIT G A r (V) for any T G T r (V), orA(%(Y)) C A r (V). 

Since the result of the earlier paragraph showed that T = AT G A(T r (V)) 
for every T G A r (V), we see that A r (V) C A{% (V)), and therefore A(% (V)) 
= A r (V). This also shows that if AT = T, then T is necessarily an element of 
A r (V). It then follows that for any T G T r (V) we have AT G A r (V), and 
hence A 2 T = A(AT) = AT so that A 2 = A. I 

Suppose A;, . . . i r and T 1 ' ' ' ' lr (where r < n = dim V and 1 < i^ < n) are 
both antisymmetric tensors, and consider their contraction A;, . . . ; r T 1 ' ' ' ' lf . 
For any particular set of indices ii, . . . , i r there will be r! different ordered 
sets (ii, . . . , i r ). But by antisymmetry, the values of A;, . . . ; r corresponding to 

each ordered set will differ only by a sign, and similarly for T 1 ' lr . This 



AT(v lt v r ) = (l/r!)2 ae5 (sgna)r(v al , ...,v or ) 
-(l/rD2 ff6a ,r(v lf ...,v r ) 
= T(v v ... , v r ) . 



AT(y ei , Vfr) - ATgfa, ... , v r ) 

= (l/r!)2 CTeSr (sgna)r fl (v CTl , ... , 
= (l/r!)2 CTGSr (sgna)r(v oei , ... , 



. V ar) 
' v o6r) ■ 



11.2 SPECIAL TYPES OF TENSORS 



559 



means that the product of Ai, . . . ; r times T 1 ' ' ' ' lr summed over the r! ordered 
sets (ii, . . . , i r ) is the same as r! times a single product which we choose to be 
the indices ii, . . . , i r taken in increasing order. In other words, we have 

Ai -.-i T 1 ' - *' = r! A i ...i. 'P'--- 1 ' 

where |ii • • • i r | denotes the fact that we are summing over increasing sets of 
indices only. For example, if we have antisymmetric tensors Ayk and T'j k in 
R 3 , then 

ApT^ = 3!A| ijk | TiJ k = 6A 123 T 123 

(where, in this case of course, Ap and T'J k can only differ by a scalar). 

There is a simple but extremely useful special type of antisymmetric 
tensor that we now wish to define. Before doing so however, it is first 
convenient to introduce another useful notational device. Note that if T G % 
and we replace v b . . . , v r in the definitions of S and by basis vectors e i5 
then we obtain an expression in terms of components as 

ST\... T = (l/r!)2 a es r T a i • • • or 

and 

#Ti . . . r = (l/r!)2aes r (sgn a)T al 

We will write Tq . . . r ) = J>Ti . . . r and Tq . . . r ] = SiT\ . . . r . For example, we 
have 

T (ij) = (l/2!)(Tij + TjO 

and 

= (miXTij-Tji) . 

A similar definition applies to any mixed tensor such as 

T k(p % n = (l/2!){r* (w) y -T k(pq) fi } 

= (1/4X7^ + T kqp tj - T^ji - T kqp ^ . 

Note that if T G 2 r (V), then T (i] . . . ir) = Ti, . . . ir , while if T G A r (V), then 
T[i, •i.| = T i ... ir . 

Now consider the vector space R 3 with the standard orthonormal basis 
{e l5 e 2 , e3}. We define the antisymmetric tensor e G A 3 (R 3 ) by the require- 
ment that 

£ 123 = £ ( e V e 2> e 3) = +1 • 



560 



MULTILINEAR MAPPINGS AND TENSORS 



Since dim A 3 (R 3 ) = 1, this defines all the components of e by antisymmetry: 
^213 = -£231 = E32i = -1 etc. If {Bj = e^i} is any other orthonormal basis for R 3 
related to the first basis by an (orthogonal) transition matrix A = (a J ;) with 
determinant equal to +1, then it is easy to see that 

E(e l5 e 2 , S3) = detA = +1 

also. This is because s(ej, e,-, e k ) = sgn a where a is the permutation that takes 
(1, 2, 3) to (i, j, k). Since £ G A 3 (R 3 ), we see that E[jjk] = Bp. The tensor e is 
frequently called the Levi-Civita tensor. However, we stress that in a non- 
orthonormal coordinate system, it will not generally be true that 8123 = +1. 

While we have defined the 8;jk as the components of a tensor, it is just as 
common to see the Levi-Civita (or permutation) symbol Bp defined simply 
as an antisymmetric symbol with 8123 = +1. In fact, from now on we shall use 
it in this way as a convenient notation for the sign of a permutation. For nota- 
tional consistency, we also define the permutation symbol e 1 ^ to have the 
same values as 8 p. A simple calculation shows that 8yk e^ k = 3! = 6. 

It should be clear that this definition can easily be extended to an arbitrary 
number of dimensions. In other words, we define 



+1 if 0\, ... , i s ) is an even permutation of (1,2, ... , n) 
-1 if 0\, ... , i s ) is an odd permutation of (1,2, ... , n) 
otherwise 



This is just another way of writing sgn a where o£S n . Therefore, using this 
symbol, we have the convenient notation for the determinant of a matrix A = 
(aij) G MnCF) as 

det A = 8i, . . . in aS • • • a in n . 

We now wish to prove a very useful result. To keep our notation from 
getting too cluttered, it will be convenient for us write d' J p k qr = d l p d J q d k . Now 

note that e = 6<5^ 2 r 31 . To see that this true, simply observe that both sides are 

antisymmetric in (p, q, r), and are equal to 1 if (p, q, r) = (1, 2, 3). (This also 
gives us a formula for 8 pqr as a 3 x 3 determinant with entries that are all 

Kronecker delta's. See Exercise 11.2.4) Using 8 123 = 1 we may write this as 

£ 123 £pqr = 65p 1 (? 2 ( 3] . But now the antisymmetry in (1, 2, 3) yields the general 

result 

e ijk e =65 [ijk] (1) 

° °pqr ^^pqr v x / 
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which is what we wanted to show. It is also possible to prove this in another 
manner that serves to illustrate several useful techniques in tensor algebra. 

Example 11.3 Suppose we have an arbitrary tensor A G A 3 (R 3 ), and hence 
A[ijk] = Ajjk. As noted above, the fact that dim A 3 (R 3 ) = 1 means that we 
must have Ap = Xsijk for some scalar A, G R. Then 

Ajk e ' jke pqr = ^ e ijk e ' jke pqr = 6Af . pqr = 6 A.S j^dpdffi . 

Because Bp = E[ijk]> we can antisymmetrize over the indices i, j, k on the right 
hand side of the above equation to obtain 6X8^5^} (write out the 6 terms if 
you do not believe it, and see Exercise 1 1.2.7). This gives us 

Ajk 8 '' £ pqr = ^^ £ ijk^pir = ^Ajk^pfr • 

Since the antisymmetric tensor Ayk is contracted with another antisymmetric 
tensor on both sides of this equation, the discussion following the proof of 
Theorem 11.3 shows that we may write 

Aijk\ £ ' jk £ pqr = 6\j k $pq} 

or 

A f 123 f -64 rS [123] 

A m £ £ pqr -° A \2-i° pqr • 
123 T1231 

Cancelling A123 then yields e e pqr = 6o pqr 1 as we had above, and hence (1) 

again follows from this. 

In the particular case that p = k, we leave it to the reader to show that 

£ * k£ kqr=^=$ir-K=W-m 

which is very useful in manipulating vectors in R 3 . As an example, consider 
the vectors A, B, C G R 3 equipped with a Cartesian coordinate system. 
Abusing our notation for simplicity (alternatively, we will see formally in 
Example 11.12 that A 1 = Aj for such an A), we have 

(B x Of = B ljk BJC k 

and hence 



A • (B x C) = A^^C 14 = +e jki B j C k A i = B • (C x A) 



562 



MULTILINEAR MAPPINGS AND TENSORS 



Other examples are given in the exercises. / 



Exercises 

1. Prove Theorem 1 1.3 for the symmetrizing mapping S- 

2. Using the Levi-Civita symbol, prove the following vector identities in R 3 
equipped with a Cartesian coordinate system (where the vectors are actu- 
ally vector fields where necessary, f is differentiable, and V 1 = 3; = d/dx 1 ): 



(a) A x (B x C ) = (A • C )B - (A • B)C 

(b) (A x B) • (C x D) = (A • C)(B • D) - (A • D)(B • C) 

(c) V x Vf = 

(d) V • (V x A) = 

(e) V x (V x A) = V(V • A) - V 2 A 

(f ) V • (A x B) = B • (V x A) - A • (V x B) 

(g) V x (A x B) = A(V • B) - B(V • A) + (B • V)A - (A • V)B 

3. Using the divergence theorem (fv V • A d 3 x =/s A • n da), prove that 



4. (a) Find the expression for E pqr as a 3 x 3 determinant with all Kronecker 
delta's as entries. 

(b) Write s^Epqr as a 3 x 3 determinant with all Kronecker delta's as 
entries. 

5. Suppose V = R 3 and let Ay be antisymmetric and be symmetric. Show 
that AyS 1 - 1 = in two ways: 

(a) Write out all terms in the sum and show that they cancel in pairs. 

(b) Justify each of the following equalities: 



6. Show that a second-rank tensor Ty can be written in the form Ty = T(y) + 
T[y], but that a third-rank tensor can not. (The complete solution for ten- 




[Hint: Let C be a constant vector and show that 




AyS" = AijS J1 = -AjiS J1 = -AyS" = . 
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sors of rank higher than two requires a detailed study of representations of 
the symmetric group and Young diagrams.) 

7. Let Aij . . . i r be antisymmetric, and suppose T 1 ' ' ' ' lr ' ' . . . is an arbitrary 
tensor. Show that 

a . . t'i ' ' ' ir ' ' ' - A • Tt'i • • • ir] • • • 

A H • • • lr 1 . . . - Ajj . . . lr 1 .... 

8. (a) Let A = (a'j) be a 3 x 3 matrix. Show 

Eijka 1 p a i qa k r = (detA)e pqr . 
(b) Let A be a linear transformation, and let y(j) = Ax(i). Show 
det[y (1) , . . . , y (n) ] = (det A) det[x ( i), . . . , x (n) ] . 

9. Show that under orthogonal transformations A = (a'j) in R 3 , the vector 
cross product x = y x z transforms as x 1 = e^y^z k = (det A)a 1 j xK Discuss 
the difference between the cases det A = +1 and det A = -1. 

11.3 THE EXTERIOR PRODUCT 

We have seen that the tensor product of two elements of A r (V) is not gen- 
erally another element of A r+r (V). However, using the mapping A we can 
define another product on A r (V) that turns out to be of great use. We adopt 

the convention of denoting elements of A r (V) by Greek letters such as a, (3 
etc., which should not be confused with elements of the permutation group S r . 

If a G A r (V) and (3 G A S (V), we define their exterior product (or wedge 

product) aA(3 to be the mapping from A r (V) x A S (V) -* A r+s (V) given by 

(r + sY 
r\s\ 

In other words, the wedge product is just an antisymmetrized tensor product. 
The reader may notice that the numerical factor is just the binomial coeffi- 
cient C r +S ) = Cs +S )- It is also worth remarking that many authors leave off this 
coefficient entirely. While there is no fundamental reason for following either 
convention, our definition has the advantage of simplifying expressions 
involving volume elements as we shall see later. 
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A very useful formula for computing exterior products for small values of 
r and s is given in the next theorem. By way of terminology, a permutation 
a G S r+S such that al < • • • < ar and a(r + 1) < • • • < a(r + s) is called an 
(r, s)-shuffle. The proof of the following theorem should help to clarify this 
definition. 

Theorem 11.4 Suppose a E A r (V) and |3 E A S (V). Then for any collection 
of r + s vectors v s E V (with r + s < dim V), we have 

aA(3(v!, . . . , v r+s ) = 2* (sgn a)a(v al , . . . , v ar )|3(v a(r+1 ) , . . . , v a(r+s) ) 

where 2* denotes the sum over all permutations a E S r+S such that al < • • • < 
ar and a(r + 1) < • • • < a(r + s) (i.e., over all (r, s)-shuffles). 

Proof The proof is simply a careful examination of the terms in the definition 
of a a (3. By definition, we have 

aAjJ(V[, ... , v r+s ) 

-[(r + s)l/rls\Ma®Py(vi, ... , v r+5 ) (*) 
= [l/r!a!]Z CT (sgnor)a(v CTl , ... , v ar )0(v a(r+1) , ... , v CT(r+i) ) 

where the sum is over all a E S r+S . Now note that there are only ( c T +& ) distinct 
collections {al, . . . , ar}, and hence there are also only f s +s ) = C r +S ) distinct 
collections {a(r + 1), . . . , a(r + s)}. Let us call the set {v a i , . . . , v ar } the "a- 
variables," and the set {v D ( r+ i), . . . , v a ( r+s )} the "(3-variables." For any of the 
fr" 1 " 8 ) distinct collections of a- and (3-variables, there will be r! ways of order- 
ing the a-variables within themselves, and s! ways of ordering the (3-variables 
within themselves. Therefore, there will be r!s! possible arrangements of the 
a- and (3-variables within themselves for each of the s ) distinct collections. 
Let a E S r+S be a permutation that yields one of these distinct collections, and 
assume it is the one with the property that al < • • • < ar and a(r + 1) < • • • < 
a(r + s). The proof will be finished if we can show that all the rest of the r!s! 
members of this collection are the same. 

Let T denote the term in (*) corresponding to our chosen a. Then T is 
given by 

T = (sgn a)a(v al , . . . , v ar )(3(v a(r+1 ) , . . . , v a(r+s) ) . 

This means that every other term t in the distinct collection containing T will 
be of the form 



t = (sgn 8)a(vei , . . . , v er )(3(ve( r+ i) , . . . , v e ( r+ s)) 
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where the permutation G S r+S is such that the set {01, ... , 0r} is the same 
as the set {al, . . . , or} (although possibly in a different order), and similarly, 
the set {0(r + 1), . . . , 0(r + s)} is the same as the set {a(r + 1), . . . , a(r + s)}. 
Thus the a- and (3-variables are permuted within themselves. But we may then 
write = (|)a where (|) G S r+S is again such that the two sets {al, . . . , or} and 
{a(r + 1), . . . , a(r + s)} are permuted within themselves. Because none of the 
transpositions that define the permutation (|) interchange a- and (3-variables, 
we may use the antisymmetry of a and |3 separately to obtain 



(It was in bringing out only a single factor of sgn (|) that we used the fact that 
there is no mixing of a- and (3-variables.) In other words, the original sum 
over all (r + s)! possible permutations a G S r+S has been reduced to a sum 
over f r + s ) = (r + s)!/r!s! distinct terms, each one of which is repeated r!s! 
times. We are thus left with 

aA(3(v!, . . . , v r+s ) = 2* (sgn a)a(v al , . . . , v ar )(3(v a(r+1 ) , . . . , v a(r+s) ) 

where the sum is over the (r + s)!/r!s! distinct collections {v a i , . . . , v ar } and 
{v (r+l) , • • • , v a ( r+s )} subject to the requirements al < • • • < or and a(r + 1) 
< • • • < a(r + s). I 

Let us introduce some convenient notation for handling multiple indices. 
Instead of writing the ordered set (i l5 . . . , i r ), we simply write I where the 
exact range will be clear from the context. Furthermore, we write X to denote 
the increasing sequence (i x < ■ ■ ■ < i r ). Similarly, we shall also write vj instead 
of (vij , . . . , v; r ). To take full advantage of this notation, we first define the 
generalized permutation symbol 8 by 



For example, 8235 = +1, £341 = -1, £231 = etc. In particular, if A = (a>i) is an 
n x n matrix, then 



? = (sgn0a)a(v 0t7l , ... 
= (sgn0CT)(sgn0)a(v ( 
= (sgna)a(v CTl , ... , \ 
= T . 




+ 1 if ... , j r ) is an even permutation of (ij , . . . , i r ) 
e iy-'i' = ' ~1 if ••■ , J r ) is an odd permutation of ... i r ) 
otherwise 



det A -e££a\-a\- a\- a\ 
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because 

e \-n - e h-j n ~ e 

Using this notation and Theorem 1 1.4, we may write the wedge product of 
a and (3 as 

«a£(v ... , v,j= J ei;;;;^a( Vji , ... , v Jr )0(v v ... , v 

or most simply in the following form, which we state as a corollary to 
Theorem 1 1.4 for easy reference. 

Corollary Suppose a G A r (V) and (3 G A S (V). Then for any collection of 
r + s vectors v s G V (with r + s < dim V) we have 

aK(${v I )=^ j e J I K a{v J )P{v K ) . 

J,K 

Example 11.4 Suppose dim V = 5 and {e b . . . , es} is a basis for V. If a G 
A 2 (V) and (3 G A ! (V), then 

aA/3(e 5 , e 2 , e 3 ) 

= * h<h , k e^ k a{e h ,e h Me k ) 

= ef 2 la{e 2 , e 3 )/5(e 5 ) + e]f z a{e 2 , e 5 )l3(e 3 ) + e 3 52 la(e 3 , e 5 )l3(e 2 ) 
= a(e 2 , e 3 )/3(e 5 )-a(e 2 , e 5 )/3(e 3 ) + a(e 3 , e 5 )/3(e 2 ) . II 

Our next theorem is a useful result in many computations. It is simply a 
contraction of indices in the permutation symbols. 

Theorem 11.5 Let I = . . . , i q ), J = (ji, • • • , jr+s), K = (k,, . . . , k r ) and 
L = (li, • • • , l s ). Then 

,IJ KL _ IKL 

1-- q+r+s q+r+s 

J 

where I, K and L are fixed quantities, and J is summed over all increasing 
subsets j, < • • • < j r+s of {1, . . . , q + r + s}. 

Proof The only nonvanishing terms on the left hand side can occur when J is 
a permutation of KL (or else ej 1 ^ = 0), and of these possible permutations, we 
only have one in the sum, and that is for the increasing set J . If J is an even 
permutation of KL, then = +1, and ef J . . q+r +s = ef KL • q+r+s since an even 
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number of permutations is required to go from J to KL. If J is an odd permu- 
tation of KL, then e^ L = -1, and ef. J . . q+r +s = -ef KL • q + r +s since an odd 
number of permutations is required to go from J to KL. The conclusion then 
follows immediately. I 

Note that we could have let J = (ji, . . . , j r ) and left out L entirely in Theorem 
11.5. The reason we included L is shown in the next example. 

Example 11.5 Let us use Theorem 11.5 to give a simple proof of the asso- 
ciativity of the wedge product. In other words, we want to show that 

oca(Pay) = (oca|3)ay 

for any a E A q (V), (3 E A r (V) and y E A S (V). To see this, let I = (i„ . . . , i q ), 
J = (ji, . . . , jr+s), K = (k„ . . . , k r ) and L = (1„ . . . , l s ). Then we have 

aA^AyXvj, ... , v q+r+s ) = I lu el J .. q+r+s a(v I )(pAY)(Vj) 

= \AL q+r+ M^K,L^^v K )y{v L ) 

= ^LK^[ KL . q+ r + s^l)^K)Yiv L ) . 

It is easy to see that had we started with (oca P) ay, we would have arrived at 
the same sum. 

As was the case with the tensor product, we simply write oca Pay from 
now on. Note also that a similar calculation can be done for the wedge product 
of any number of terms. / 

We now wish to prove the basic algebraic properties of the wedge product. 
This will be facilitated by a preliminary result on the alternation mapping. 

Theorem 11.6 If S E % and T E % , then 

A((AS)®T) =A(S®T) =A(S®(jn)) . 

Proof Using the bilinearity of the tensor product and the definition of RS we 
may write 

C#S)®T = (l/r!)2aes r (sgna)[(aS)®T] . 
For each a E S r , let G C S r+S be the set of permutations § defined by 



(<|>1, . . . , <|>(r + s)) = (al, . . . , or, r + 1, . . . , r + s) 
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In other words, G consists of all permutations (|) G S r+S that have the same 
effect on 1, . . . , r as o £ S r , but leave the remaining terms r + 1, . . . , r + s 
unchanged. This means that sgn (|) = sgn a, and (|)(S ® T) = (aS) ® T (see 
Exercise 11.3.1). We then have 

(RS)®T = (l/r!)2^G(sgn()))(t)(S®T) 

and therefore 

MAS) ®T) = [l/(r + s)!]Z TGSr+j (sgnr)T((l/r!)2 0eG (sgn^)^(S ® T)) 
= [\l{r + s)!](l/r!)Z, eG Z reSw (sgnT)(sgn0)T0(S ® 7) . 

But for each (j) £ G, we note that S r+S = {0 = T(|):t£ S r+S }, and hence 

[l/(r + 5)!]2 Te5r+j (sgnT)(sgn0)T0(5 ® T) 

= [l/(r + s)!]Z TGSr+j (sgnT0)T0(S ® T) 
= [II (r + 5)!]2 ee5r+j (sgn0)0(5 ® T) 
= J^(5®T) . 

Since this is independent of the particular (j)£G and there are r! elements in 
G, we then have 

M*S)®T)-(l/r\)'Z tf =GA(.S®T) 
= ^(5®r)(l/r!)2 0eG l 
= R(S®T) . 

The proof that A(S ® T) = A(S ® (AT)) is similar, and we leave it to the 
reader (see Exercise 11.3.2). I 

Note that in defining the wedge product a a (3, there is really nothing that 

requires us to have a G A r (V) and (3 G A S (V). We could just as well be more 
general and let a£T r (V) and (3 G % (V). However, if this is the case, then 
the formula given in Theorem 11.4 most certainly is not valid. However, we 
do have the following corollary to Theorem 11.6. 

Corollary For any S£T r and T G % we have ^SaT = SaT = Sa AT. 

Proof This follows directly from Theorem 11.6 and the wedge product 
definition SaT = [(r + s)!/r!s!] JZ(S ® T). I 

We are now in a position to prove some of the most important properties 
of the wedge product. Note that this next theorem is stated in terms of the 



1 1 .3 THE EXTERIOR PRODUCT 



569 



more general definition of the wedge product. 

Theorem 11.7 Suppose a, a ls a 2 E T q (V), (3, |3„ |3 2 E % (V), y£T s (V) and 
a£f. Then 

(a) The wedge product is bilinear. That is, 

(a, + a 2 )A(3 = a,Ap + a 2 A|3 
aA((3! + |3 2 ) = aA(3 1 + aA(3 2 
(aa)A(3 = otA(a|3) = a(ocA|3) 

(b) aA(3 = (-l) £ i r (3Aa. 

(c) The wedge product is associative. That is, 

oia(Pay) = (oca|3)ay = [(q + r + s)!/q!r!s!].#(a® |3®y) . 

Proof (a) This follows from the definition of wedge product, the fact that ® 
is bilinear and A is linear. This result may also be shown directly in the case 

that a, a u a 2 E A q (V) and (3, (3,, |3 2 E A r (V) by using the corollary to 
Theorem 11.4 (see Exercise 11.3.3). 

(b) This can also be shown directly from the corollary to Theorem 11.4 
(see Exercise 11.3.4). Alternatively, we may proceed as follows. First note 
that for a E S r , we see that (since for any other x E S r we have x(oa) = 
(x ° o)a, and hence x(aa)(v,, . . . , v r ) = a(v Ta i , . . . , v Tar )) 

R(oa)(v v ... , v r ) = (l/r!)2 Tes (sgnT)T(aa)(v 1 , ... , v r ) 
= (l/r!)2 Tes> (sgnr)a(v Tal , ... , v xar ) 
= (l/r!)2 Tes (sgnTCT)(sgna)a(v TCTl , ... , v rar ) 
= (sgna)(l/r!)2 0es (sgn0)a(v ei , ... , v 6r ) 
= (sgn o)&o(v x , ... , v r ) . 

Hence Jl(oa) = (sgn a) fta. 
Now define a E S q+r by 

a (l, . . . , q + r) = (q + 1, . . . , q + r, 1, . . . , q) . 

Since a is just a product of qr transpositions, it follows that sgn o = (-l) qr . 
We then see that 

a®p( Vl , ... , v g+r ) = (p®a)(v 0ol , ... , v ao(q+r) ) 
= a (/3®a)(v 1 , ... , v ) . 
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Therefore (ignoring the factorial multiplier which will cancel out from both 
sides of this equation), 

a a p = A(a ® 0) = A(o (P®a)) = (sgn ct )A(P ® a) 
= (-\) qr pha . 

(c) Using Theorem 11.6, we simply calculate 

(a a P) a y = [(<? + r + s)!/(<? + r)!j!].fl((a a p) ® y) 

= [($ + r + + r)! + r)l/qlrl]Jl(Jl(a ®P)®y) 
= [(q + r + s)l/(qlrlsl]A(a® p®y) . 

Similarly, we find that aA((3Ay) yields the same result. We are therefore justi- 
fied (as we also saw in Example 11.5) in writing simply oca p Ay. Furthermore, 
it is clear that this result can be extended to any finite number of products. I 

Example 11.6 Suppose aGT r and (3 G T s . Since ocaP = (-l) rs pAa, we see 
that if either r or s is even, then ocaP = Paoc, but if both r and s are odd, then 
ocaP = -Paoc. Therefore if r is odd we have ocaoc = 0, but if r is even, then 
oiAa is not necessarily zero. In particular, any 1-form a always has the 
property that ocaoc = 0. / 

Example 11.7 If ai , . . . , as are 1-forms on R 5 , let us define 

P = OC1AOC3 + a3AOl5 

and 

Y = 2oi2Aa4Aa5 - aiAa2A0C4 . 

Using the properties of the wedge product given in Theorem 11.7 we then 
have 

Pay = (aiAOC3 + a3Aa5)A(2a2AOC4Ao:5 - ociao^ao^) 
= 2aiAa3A0i2Ao:4Aa5 - aiAa3AaiAa2Ao:4 

+ 2oc3Aa5Aa2Aa4Aa5 - a3Aa5AaiAa2A0i4 
= -2aiAa2Aa3Aa4A0C5 - + - aiAa2Aa3Aa4Aa5 
= -3aiAa2Aa3Aa4Aot5 . / 

Example 11.8 Suppose 04 , . . . , a r €E A ! (V) and v l5 . . . , v r G V. Using 
Theorem 11.5, it is easy to generalize the corollary to Theorem 11.4 to obtain 
(see Exercise 11.3.5) 
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ai a • • • a a r (vi , . . . , v r ) = 2 (i . . . (V e[\ [ [ l ; a x (v ( - ) — a r (v, v ) 
= det(a ( (v,)) . 

(Note that the sum is not over any increasing indices because each a; is only a 
1-form.) As a special case, suppose {ej is a basis for V and {co 1 } is the cor- 
responding dual basis. Then oo^ej) = S J j and hence 

afi a • • • a oA {e h ,...,e jr ) = \ ... k e\ [^o/ (e k{ )■■■ co 1 ' (e K ) 

= e' 1 '"' r 

j\—jr 

In particular, if dim V = n, choosing the indices . . . , i n ) = (1, . . . , n) = 
(ji, . . . ,j n ), we see that 

co 1 a • • • Aw n (e 1 , . . . , e n ) = 1 . / 

Exercises 

1 . Show that <|>(S ® T) = (aS) T in the proof of Theorem 11.6. 

2. Finish the proof of Theorem 1 1.6 by showing that 

^(S®T)=^(S®(^T)). 

3. Using a, E A q (V) and (3, G A r (V), prove Theorem 11.7(a) directly from 
the corollary to Theorem 11.4. 

4. Use the corollary to Theorem 1 1 .4 to prove Theorem 1 1 .7(b). 

5. Suppose a x , . . . , a T G A ! (V) and v b . . . , v r G V. Show that 

a x A • • • Aa r (v 1; . . . , v r ) = det(a i (v j )) . 

6. Suppose {e^ . . . , e n } is a basis for V and {co 1 , . . . , co"} is the corre- 
sponding dual basis. If a G A r (V) (where r < n), show that 

a = 2ia(ei)(D I = 2i,< • • • <i r ot(ei,, . . . , e^co^A • • • aco 1 ' 
by applying both sides to (ej,, . . . , ej r ). (See also Theorem 11.8.) 
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7. (Interior Product) Suppose a G A r (V) and v, v 2 , 
define the (r - l)-form z' v a by 



, v r G V. We 



i v a = 



if r = 0. 

i v a = a(v) if r = 1 . 

i v a(v 2 , ... , v r ) = a(v, v 2 , ... , v r ) if r>l. 

(a) Prove that z' u+v = z' u + / v . 

(b) If a G A r (V) and (3 E A S (V), prove that i v : A r+s (V) A r+s_1 (V) is 
an anti- derivation, i.e., 



z' v (aA(3) = (z' v a)A(3 + (-l) r aA(/ v P) . 

(c) If v = v'ej and a = Sia;, . . . ^oo^a ■ ■ ■ Aco lr where {co 1 } is the basis 
dual to {e;}, show that 



Z v OC = 2i 2 < • • • <i r b; 2 . . . i r 00 l2 A- • -AO) 11 



where 



b i2 ... ir = 2 jV Ja 



L J12 • • ■ lr 



(d) If a = f 1 A • • • Af r , show that 



i v a = j (" D"" 1 / * (V)/ 1 A • • • A f k ~ l A f M A • • • A f 

-2(-D*"7Wa»-a/*a-.-a/' 



where the " means that the term f is to be deleted from the expression. 

8. Let V = R n have the standard basis {ej, and let the corresponding dual 

basis for V* be {co 1 }. 
(a) If u, v G V, show that 



co a co J (u, v) = 



; i 
U V 



U J V J 



and that this is ± the area of the parallelogram spanned by the projection 

of u and v onto the x 1 x J -plane. What do you think is the significance of 
the different signs? 
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(b) Generalize this to cd^a • • • Aco lr where r < n. 



9. Suppose V = ^F n , and let vi, . . . , v n G V have components relative to the 
standard basis {ej defined by v; = ejV J i. For any 1 < r < n, let s = n - r 
and define the r-form a by 



a(v l5 ... , v r ) = 
and the s-form (3 by 

v r + l 



(a) Use Theorem 4.9 to show that a a (3 is the determinant function D on 

r. 

(b) Show that the sign of an (r, s)-shuffle is given by 

'I-'' irh ■ ■ ■ is _ /I Vi + " +i r +r(r+l)/2 



,:.^.v=(-D 



where ii, . . . , i r and ji, . . . , j s are listed in increasing order, 
(c) If A = (a'j) G M n C70, prove the Laplace expansion formula 



detA = 2 / (-iy i + -^ +r( '" +1)/2 



a 



aJl r+l 



a 



r+l 



where I = {ii, . . , i r } and J = {ji, . . . , j s } are "complementary" sets of 
indices, i.e., I D J = and I U J = {1, 2, . . . , n}. 

10. Let ® = r\A where A: % ->T r is the alternation mapping. Define aA(3 in 
terms of <B. What is ^(f 1 ® • • • ® f r ) where f G V*? 



11. Let I = (ii, . . . , i q ), J = (ji, . . . , jp), and K = (ki, 
following generalization of Example 11.3: 



, k q ). Prove the 



, e \---p+q £ JK 



e\r P+q =e I K =n\d [ ? -6 k 
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11.4 TENSOR ALGEBRAS 

We define the direct sum of all tensor spaces T r (V) to be the (infinite- 
dimensional) space % (V) © Ti(V) © • • • © % (V) © • • • , and T(V) to be all 
elements of this space with finitely many nonzero components. This means 
that every element T G T(V) has a unique expression of the form (ignoring 
zero summands) 

T = T (1) i, + • • • + T (r) ir 

where each T^ k \ k G c T lk (V) and i x < • • • < i r . The tensors T^ k ^ k are called the 
graded components of T. In the special case that T G % (V) for some r, then 
T is said to be of order r. We define addition in T(V) componentwise, and we 
also define multiplication in T (V) by defining ® to be distributive on all of 
T(V). We have therefore made T(V) into an associative algebra over f which 
is called the tensor algebra. 

We have seen that A r (V) is a subspace of T r (V) since A r (V) is just the 
image of % (V) under A. Recall also that A°(V) = % (V) is defined to be the 
scalar field J. As might therefore be expected, we define A(V) to be the 
direct sum 

A(V) = A°(V) © Akv) © • • • © A r (V) © • • • c T(V) . 

Note that A r (V) = if r > dim V. 

It is important to realize that if a G A r (V) C T r (V) and (3 G A S (V) C 
T S (V), then even though a ® (3 G % +s (Y), it is not generally true that 

a ® (3 G A r+s (V). Therefore A(V) is not a subalgebra of T(V). However, the 

wedge product is a mapping from A r (V) x A S (V) — > A r+s (V), and hence if 
we extend this product in the obvious manner to a bilinear mapping A(V) x 
A(V) -» A(V), then A(V) becomes an algebra over <J = A°(V). In other 
words, if a = a, + • • • + a r with each a s G A r '(V), and |3 = + • • • + |3 S with 
each ft G A S '(V), then we define 

This algebra is called the Grassmann (or exterior) algebra. 

The astute reader may be wondering exactly how we add together the ele- 
ments a, G A r '(V) and a 2 G A f2 (V) (with r l ^ r 2 ) when none of the opera- 
tions (a, + a 2 )(v 1; . . . , v ri ), (a, + a 2 )(v!, . . . , v r2 ) nor (c^ + a 2 )(v b . . . , 
v ri+r2 ) makes any sense. The answer is that for purposes of the Grassmann 
algebra, we consider both a, and a 2 to be elements of A(V). For example, if 
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(*! is a 1-form and a 2 is a 2-form, then we write a,=0 + a,+0 + + -- - and 
oc 2 = + + oc 2 + () + •••, and hence a,a, + a 2 a 2 (where a ; G T) makes sense 
in A(V). In this way, every element of A(V) has a degree (recall that an r- 
form is said to be of degree r), and we say that A(V) is a graded associative 
algebra. 

Unlike the infinite-dimensional algebra T(V), the algebra A(V) is finite- 
dimensional. This should be clear from the discussion following Example 11.2 
where we showed that dim A r (V) = (?) (where n = dim V), and hence that 
dim A r (V) = if r > n. Let us now prove this result again by constructing a 
specific basis for A r (V). 

Theorem 11.8 Suppose dim V = n. Then for r > n we have A r (V) = {0}, and 
if < r < n, then dim A r (V) = (?). Therefore dim A(V) = 2 n . Moreover, if 

{or, . . . , oo n } is a basis for V* = A^V), then a basis for A r (V) is given by 
the set 

{co^a • • • Aco lr : 1 < ij < • • • < i r < n} . 

Proof Suppose a G A r (V) where r > dim V = n. By multilinearity, a is 
determined by its values on a basis {e l5 . . . , e n } for V (see Example 11.2). 
But then we must have a(eij , . . . , e; r ) = because at least two of the e; k are 
necessarily the same and a is antisymmetric. This means that a(v 1; . . . , v r ) = 
for all v, G V, and hence a = 0. Thus A r (V) = {0} if r > n. 

Now suppose that {co 1 , . . . , co n } is the basis for V* dual to {ej}. From 
Theorem 11.2, we know that {go 1 ' • • • oo lr : 1 < i x , . . . , i r < n} forms a 
basis for T r (V), and since the alternation mapping A maps T r (V) onto A r (V) 
(Theorem 1 1.3(b)), it follows that the image of the basis {co 1 ' • • • oo lr } for 
% (V) must span A r (V). If a G A r (V), then a G % (V), and hence 

a = a;, . . . i r co 1 ' ® • • • co lr 

where the sum is over all 1 < i l5 . . . , i r < n and a lx . . . ; r = a(ei, , . . . , ei r ). 
Using Theorems 11.3(a) and 11.7(c) we have 

a = Aa= a,- , A{oo h ®---®co ir ) 
= a ii ... ir (l/rl)(o' 1 A---Aa)' r 

where the sum is still over all 1 < i l5 . . . , i r < n. However, by the antisymme- 
try of the wedge product, the collection . . . , i r } must all be different, and 
hence the sum can only be over the (?) distinct such combinations. For each 
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such combination there will be r! permutations a G S r of the basis vectors. If 
we write each of these permutations in increasing order i x < • • • < i r , then the 
wedge product changes by a factor sgn a, as does a;, . . . ; r = a(ei, , . . . , ei r ). 
Therefore the signs cancel and we are left with 

a = Ct|i, . . . i r | CO 1 ' A • • • A00 lr 

where, as mentioned previously, we use the notation a\ . . . ; r | to mean that the 
sum is over increasing sets i x < • • • < i r . Thus we have shown that the (?) ele- 
ments co 11 a • • • Aoo lr with 1 < i[ < • • • < i r < n span A r (V). We must still show 
that they are linearly independent. 

Suppose cx|i, • • • i r | to 11 a • • • Aoo lr = 0. Then for any set {ej, , . . . , ej r } with 
1 < j! < • • • < j r < n we have (using Example 1 1.8) 

= ir \(O h ^■■■^(o lr (e h ,..., e Jr ) 
= a k-ir 

since the only nonvanishing term occurs when {i u . . . , i r } is a permutation of 
{] u . . . , j r } and both are increasing sets. This proves linear independence. 
Finally, using the binomial theorem, we now see that 



n 



dimA0O = 2 dim A r 0O = 2 =(1+1)"=2" 



r=0 r=0 



Example 11.9 Another useful result is the following. Suppose dim V = n, 
and let {go 1 , . . . , oo n } be a basis for V*. If a , ... , a n are any other 1-forms in 
A (V) = V*, then we may expand each a 1 in terms of the ocP as a 1 = a 1 j oc) J . We 
then have 



a 1 a • • • a a n = a 1 ,- • • • a"- a/ 1 a • • • a co 1 " 

1 „n J x •■•;„ 1 n 



= a : ■■■a ; e, 1 "co a ••• aco 
= det(a' 7 )a> 1 a •••aco" 



Recalling Example 1 1.1, if {co 1 = dx 1 } is a local basis for a cotangent space V* 
and {a 1 = dy 1 } is any other local basis, then dy 1 = (3y73x J )dx J and 
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det(a ! i ) = 



d(y l -y n ) 
d(x l -x n ) 



is just the usual Jacobian of the transformation. We then have 



dy a---a J/ = 



d(y ■•■/) , i 



<ix a • • • a dx' 



d{x l --x n ) 



The reader may recognize dx a • • • Adx n as the volume element on R n , and 
hence differential forms are a natural way to describe the change of variables 
in multiple integrals. / 

Theorem 11.9 If a 1 , . . . , a r G A^V), then {a 1 , . . . , a r } is a linearly 
dependent set if and only if a 1 a • • • Aa r = 0. 

Proof If {a 1 , . . . , a r } is linearly dependent, then there exists at least one 
vector, say a 1 , such that a 1 = 2 ;>1 aj0f ; . But then 



since every term in the sum contains a repeated 1-form and hence vanishes. 

Conversely, suppose that a 1 , . . . , a r are linearly independent. We can 
then extend them to a basis {a 1 , . . . , a n } for V* (Theorem 2.10). If {ei} is the 
corresponding dual basis for V, then a a • • • aoc"^, . . . , e n ) = 1 which 
implies that a a • • • Aa r ^ 0. Therefore {a 1 , . . . , a r } must be linearly depen- 
dent if a a • • • Aa r = 0. I 

11.5 THE TENSOR PRODUCT OF VECTOR SPACES 

We now discuss the notion of the tensor product of vector spaces. Our reason 
for presenting this discussion is that it provides the basis for defining the 
Kronecker (or direct) product of two matrices, a concept which is very useful 
in the theory of group representations. 

It should be remarked that there are many ways of defining the tensor 
product of vector spaces. While we will follow the simplest approach, there is 
another (somewhat complicated) method involving quotient spaces that is also 
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frequently used. This other method has the advantage that it includes infinite- 
dimensional spaces. The reader can find a treatment of this alternative method 
in, e.g., in the book by Curtis (1984). 

By way of nomenclature, we say that a mapping f: U x V -» W of vector 
spaces U and V to a vector space W is bilinear if f is linear in each variable. 
This is exactly the same as we defined in Section 9.4 except that now f takes 
its values in W rather than the field J. In addition, we will need the concept of 
a vector space generated by a set. In other words, suppose S = {s l5 . . . , s n } is 
some finite set of objects, and J is a field. While we may have an intuitive 
sense of what it should mean to write formal linear combinations of the form 
a^! + • • • + a n s n , we should realize that the + sign as used here has no 
meaning for an arbitrary set S. We now go through the formalities involved in 
defining such terms, and hence make the set S into a vector space T over J. 

The basic idea is that we want to recast each element of S into the form of 
a function from S to ( f. This is because we already know how to add functions 
as well as multiply them by a scalar. With these ideas in mind, for each s ; G S 
we define a function Sj: S -* ^Fby 

Sj(Sj) = lSjj 

where 1 is the multiplicative identity of ( f. Since addition in J is well-defined 
as is the addition of functions and multiplication of functions by elements of 
J, we see that for any a,bGf and s i; Sj G S we have 

(a + b)s { (Sj) = (a + b)d t j = a&u + bdy = as i (s • ) + bs t (s ■ ) 
= (as i +bs i )(s j ) 

and therefore (a + b)sj = as s + bSj. Similarly, it is easy to see that a(bSj) = 
(ab)Sj. 

We now define T to be the set of all functions from S to c f. These func- 
tions can be written in the form a t s, + • • • + a n s n with a ; G ( f. It should be 
clear that with our definition of the terms a^i, T forms a vector space over ( f. 
In fact, it is easy to see that the functions lsj , . . . , ls n are linearly 
independent. Indeed, if denotes the zero function, suppose a,s, + • • • + a n s n 
= for some set of scalars a s . Applying this function to Sj (where 1 < i < n) we 
obtain a s = 0. As a matter of course, we simply write s s rather than ls ; . 

The linear combinations just defined are called formal linear combina- 
tions of the elements of S, and T is the vector space generated by the set S. T 
is therefore the vector space of all such formal linear combinations, and is 
sometimes called the free vector space of S over J. 



1 1 .5 THE TENSOR PRODUCT OF VECTOR SPACES 



579 



Theorem 11.10 Let U, V and W be finite- dimensional vector spaces over f. 
Then there exists a finite-dimensional vector space over J denoted by T and a 
bilinear mapping t: U x V -* T denoted by t(u, v) = u ® v satisfying the fol- 
lowing properties: 

(a) For every bilinear mapping f: U x V -*■ W, there exists a unique linear 
transformation f : T -*■ W such that f = f ° t. In other words, for all u G U and 
v G V we have 

f(u,v) = f (t(u,v)) = f (u0v) . 

(b) If {u b . . . , u m } is a basis for U and {v l5 . . . , v n } is a basis for V, then 
{Ui ® Vj} is a basis for T and therefore 

dimT = mn = (dim U) (dim V) . 

Proof Let {u b . . . , u m } be a basis for U and let {v b . . . , v n } be a basis for 
V. For each pair of integers (i, j) with 1 < i < m and 1 < j < n we let ty be a 
letter (i.e., an element of some set). We now define T to be the vector space 
over J consisting of all formal linear combinations of the elements t ;j . In other 

words, every element of T is of the form a 1J t ;j where a 1J G J. 
Define the bilinear map t: U x V — > T by 

Ui ®Vj - t(Ui, Vj) = ty 

and hence to all of U x V by "bilinear extension." In particular, if u = x'Uj G U 
and v = y J Vj G V, let us define u ® v to be that element of T given by 

u ® v = t(u, v) = x'yj tjj . 

It should be obvious that this does indeed define a bilinear map. 

Now suppose that f: U x V -*■ W is any bilinear map, and remember that 
every element of T is a linear combination of the tjj. According to Theorem 
5.1, we may define a unique linear transformation f : T -*■ W by 

F(ty) = f(U„ Vj ) . 

Using the bilinearity of f and the linearity of f we then have 



/<». v) = f(x% y j Vj ) = .vV/<",. vj) = xyfrtij) = /(*')>%) 
= ~f{u®v)=~f{t{u,v)) . 
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This proves the existence and uniqueness of the mapping f such that f = f ° t 
as specified in (a). 

We have defined T to be the vector space generated by the mn elements 
tjj = Uj ® Vj where {u l5 . . . , u m } and {v b . . . , v n } were particular bases for U 
and V respectively. We now want to show that in fact {u'j v'j} forms a basis 
for T where {u'j} and {v'j} are arbitrary bases for U and V. For any u = 
x a u'i E U and v = y' J v'j G V, we have (using the bilinearity of ®) 

u® v = x'y-Ku'i ®v'j) 

which shows that the mn elements u' ; ® v'j span T. If these mn elements are 
linearly dependent, then dim T < mn which contradicts the fact that the mn 
elements t ;j form a basis for T. Hence {u'j ® v'j} is a basis for T. I 

The space T defined in this theorem is denoted by U ® V and called the 
tensor product of U and V. Note that T can be any mn dimensional vector 

space. For example, if m = n we could take T = T^CV) with basis ty = oo 1 ® olP, 

1 < i, j < n. The map t(u;, vj) = u; ® vj then defines Ui ® vj = co 1 ® o>>. 

Example 11.10 To show how this formalism relates to our previous treat- 
ment of tensors, consider the following example of the mapping f defined in 
Theorem 1 1.10. Let {ej be a basis for a real inner product space U, and let us 
define the real numbers gij = (e h e,-). If Bj = e^ is another basis for U, then 

gy = (e„ej} = p r ,p s j(e r , e s ) = p r ,p S j g rs 

so that the g ;j transform like the components of a covariant tensor of order 2. 
This means that we may define the tensor g G I^U) by g(u, v) = (u, v). This 
tensor is called the metric tensor on U (see Section 11.10). 

Now suppose that we are given a positive definite symmetric bilinear form 
(i.e., an inner product) g=(,):UxU->f. Then the mapping g is just the 
metric because 

g(e,®ej) = g(e„ej) = (e„ ej} = g y . 

Therefore, if u = u'ej and v = v J ej are vectors in U, we see that 

g(u®v) = g(u, v) = (u, v) = uVfe.e,-) = gij-uV . 

If {go 1 } is the basis for U* dual to {ej}, then according to our earlier formal- 
ism, we would write this as g = gjjCO 1 ® o 1 . / 
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Some of the main applications in mathematics and physics (e.g., in the 
theory of group representations) of the tensor product of two vector spaces are 
contained in the next two results. While the ideas are simple enough, the nota- 
tion becomes somewhat awkward because of the double indices. 

Theorem 11.11 Let U and V have the respective bases {u 1; . . . , u m } and 
{v b . . . , v n }, and suppose the linear operators S G L(U) and T G L(V) have 

matrix representations A = (a 1 ,) and B = (b'j) respectively. Then there exists a 
linear transformation S®T:U®V^»U®V such that for all u G U and v G 
V we have (S ® T)(u ® v) = S(u) ® T(v). 

Furthermore, the matrix C of S ® T relative to the ordered basis 

{Uj ® Vi, .... Ui ® V n , U 2 ® V l5 .... U 2 ® V n , .... U m ® Vi, .... U m ® V n } 

for U ® V is the mn x mn matrix given in block matrix form as 



The matrix C is called the Kronecker (or direct or tensor) product of the 

matrices A and B, and will also be written as C = A ® B. 

Proof Since S and T are linear and ® is bilinear, it is easy to see that the 
mapping f: U x V ^ U ® V defined by f(u, v) = S(u) ® T(v) is bilinear. 
Therefore, according to Theorem 11.10, there exists a unique linear transfor- 
mation f G L(U ® V) such that f (u ® v) = S(u) ® T(v). We denote the map- 
ping f by S ® T. Thus, (S ® T)(u ® v) = S(u) ® T(v). 

To find the matrix C of S ® T is straightforward enough. We have S(u ; ) = 
Uj a J j and T(v s ) = Vjb J i, and hence 



Now recall that the ith column of the matrix representation of an operator is 
just the image of the ith basis vector under the transformation (see Theorem 
5.11). In the present case, we will have to use double pairs of subscripts to 
label the matrix elements. Relative to the ordered basis 



a x B a l 2 B 



\ 



C = 





(S®T)(u,®v J ) = S(u,)®T( Vj ) = a 1 



r ib s j(u r ®v s ) . 



{u, ® v 1; . . . , u, ® v n , u 2 ® v 1; . . . , u 2 ® v n , . . . , u m ® v 1; . . . , u m ® v n } 
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for U V, we then see that, for example, the (1, l)th column of C is the 
vector (S ® T)(U[ ® v,) = a^b^Uj- ® v s ) given by 

(Ab 1 !, . . . , &\b\ aWi, • • • , a 2 ib n i, . . . , a^b 1 !, . . . , a m ib n i) 

and in general, the (i, ])th column is given by 

(aVj, . . . , a 1 ^, a 2 ^, . . . , a 2 ib n j, . . . , a'Vj, . . . , a m ib n j) . 

This shows that the matrix C has the desired form. I 

Theorem 11.12 Let U and V be finite-dimensional vector spaces over ( f. 

(a) If S„ S 2 G L(U) and T„ T 2 E L(V), then 

(S 1 ®T 1 )(S 2 ®T 2 ) = SiS,®^ . 

Moreover, if A ; and B ; are the matrix representations of Sj and T ; respectively 
(relative to some basis for U ® V), then (A! ® B,)(A 2 ® B 2 ) = A,A 2 ® B^. 

(b) If S G L(U) and T E L(V), then Tr(S ® T) = (Tr S)(Tr T). 

(c) If S E L(U) and T E L(V), and if S" 1 and T" 1 exist, then 

(S ® T)" 1 = S" 1 ® T 1 . 

Conversely, if (S ® T)" 1 exists, then S" 1 and T" 1 also exist, and (S ® T)" 1 = 
S"> ® T" 1 . 

Proof (a) For any u E U and v E V we have 

(S 1 ® T x )(S 2 ® T 2 )(u ® v) = (S x ® T, )(S 2 (u)®T 2 (v)) 

= 5 1 5 2 ( M )®r 1 r 2 (v) 

= (S 1 S 2 ®T 1 T 2 )(u®v) . 

As to the matrix representations, simply note that A ; ® B ; is the representation 
of Sj ® Tj, and A[A 2 ® B,B 2 is the representation of ® T!T 2 (since the 
representation of a product of linear operators is the product of their matrices). 

(b) Recall that the trace of a linear operator is defined to be the trace of 
any matrix representation of the operator (see Theorem 5.19). Therefore, if 
A = (a'j) is the matrix of S and B = (b'j) is the matrix of T, we see from 
Theorem 11.11 that the diagonal blocks of A ® B are the matrices a\B, . . . , 
a m m B and hence the diagonal elements of A ® B are a^b 1 ,, . . . , a 1 ib n n , . . . , 
a m mb\, . . . , a m m b n n . Therefore the sum of these diagonal elements is just 
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Tr(A®B) = a\0: i b i i ) + - + a m m a: i b i i ) 
= (TrA)(Trfl) . 

(c) We first note that if 1 denotes the identity transformation, then 

(1 ® l)(u® v) = u® v 

and hence 1 ® 1 = 1. Next note that u®v = (u + 0)®v = u®v + 0®v, and 
hence ® v = 0. Similarly, it is clear that u ® = 0. This then shows that 

(S ® 0)(u ® v) = S(u) ® = 

so that S ® = 0, and similarly ® T = 0. 

Now, if S and T are invertible, then by part (a) we see that 

(S" 1 ®T-')(S®T) = SS" 1 ®^ = 1®1 = 1 

and similarly for (S ® TXS" 1 ® T" 1 ). Therefore (S ® T)" 1 = S"> ® T" 1 . 

Conversely, suppose that S ® T is invertible. To prove that S and T are 
also invertible we use Theorem 5.9. In other words, a surjective linear 
operator is invertible if and only if its kernel is zero. Since S ® T is invertible 
we must have T ^ 0, and hence there exists v G V such that T(v) ^ 0. Suppose 
u G U is such that S(u) = 0. Then 

= S(u) ® T(v) = (S ® T)(u ® v) 

which implies that u ® v = (since S ® T is invertible). But v ^ 0, and hence 
we must have u = 0. This shows that S is invertible. Similarly, had we started 
with S ^ 0, we would have found that T is invertible. I 

Exercises 

1. Give a direct proof of the matrix part of Theorem 11.12(a) using the 
definition of the Kronecker product of two matrices. 

2. Suppose A G L(U) and B G L(V) where dim U = n and dim V = m. Show 
that 

det(A®B) = (detA) m (detB) n . 
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11.6 VOLUMES IN R 3 



Instead of starting out with an abstract presentation of volumes, we shall first 
go through an intuitive elementary discussion beginning with R , then going 
to R 3 , and finally generalizing to R n in the next section. 

First consider a parallelogram in R (with the usual norm) defined by the 
vectors X and Y as shown. 



X 




Note that h = IIYI sin and b = IIYI cos 6, and also that the area of each tri- 
angle is given by A[ = (l/2)bh. Then the area of the rectangle is given by A 2 = 
(11X11 - b)h, and the area of the entire parallelogram is given by 

A = 2Aj +A 2 =bh + (11X11 - b)h = \\X\\h = 11X11 IIFIIsinfl . (1) 



The reader should recognize this as the magnitude of the elementary "vector 
cross product" X x Y of the ordered pair of vectors (X, Y) that is defined to 
have a direction normal to the plane spanned by X and Y, and given by the 
"right hand rule" (i.e., out of the plane in this case). 

If we define the usual orthogonal coordinate system with the x-axis 
parallel to the vector X, then 



and 



X = (x 1 , x 2 ) = (11X11, 0) 
Y = (y 1 , y 2 ) = (IIYI cos 6, IIYI sin 0) 



and hence we see that the determinant with columns formed from the vectors 
X and Y is just 







* y 




2 2 




x y 





11X11 ||Yllcos0 
IIYI sin0 



= 11X11 IIYI sin0 = A 



(2) 



Notice that if we interchanged the vectors X and Y in the diagram, then the 
determinant would change sign and the vector X x Y (which by definition has 
a direction dependent on the ordered pair (X, Y)) would point into the page. 
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Thus the area of a parallelogram (which is always positive by definition) 
defined by two vectors in R 2 is in general given by the absolute value of the 
determinant (2). 

In terms of the usual inner product (or "dot product") ( , ) on R , we have 
(X, X) = IXI 2 and (X, Y) = (Y, X) = IXIIIYI cos 6, and hence 

A 2 =\\X\\ 2 \\Y\\ 2 sm 2 d 
= ILYll 2 llFll 2 (l-cos 2 0) 
= \\X\\ 2 \\Y\\ 2 -(X,Y) 2 . 



Therefore we see that the area is also given by the positive square root of the 
determinant 

(X,X) (X,Y) 
(Y,X) (Y,Y) 



A 2 = 



(3) 



It is also worth noting that the inner product may be written in the form 



(X, Y) = x*y ! + x z y z , and thus in terms of matrices we may write 



,2, ,2 



<[X,X) [X,Y)\ 
AY,X) (Y,Y)j 



1 2 

[y y 



y 



2 2 

* y 



Hence taking the determinant of this equation (using Theorems 4.8 and 4.1), 

we find (at least in R ) that the determinant (3) also implies that the area is 
given by the absolute value of the determinant in equation (2). 

It is now easy to extend this discussion to a parallelogram in R 3 . Indeed, if 

X = (x 1 , x 2 , x 3 ) and Y = (y 1 , y 2 , y 3 ) are vectors in R 3 , then equation (1) is 

unchanged because any two vectors in R 3 define the plane R 2 spanned by the 
two vectors. Equation (3) also remains unchanged since its derivation did not 
depend on the specific coordinates of X and Y in R . However, the left hand 
part of equation (2) does not apply (although we will see below that the three- 
dimensional version determines a volume in R 3 ). 

As a final remark on parallelograms, note that if X and Y are linearly 
dependent, then aX + bY = so that Y = -(a/b)X, and hence X and Y are co- 
linear. Therefore equals or jt so that all equations for the area in terms of 
sin 6 are equal to zero. Since X and Y are dependent, this also means that the 
determinant in equation (2) equals zero, and everything is consistent. 
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We now take a look at volumes in R 3 . Consider three linearly independent 
vectors X = (x 1 , x 2 , x 3 ), Y = (y 1 , y 2 , y 3 ) and Z = (z 1 , z 2 , z 3 ), and consider the 
parallelepiped with edges defined by these three vectors (in the given order 
(X,Y,Z)). 




We claim that the volume of this parallelepiped is given by both the positive 
square root of the determinant 



(X,X) (X,Y) (X,Z) 
(Y,X) (Y, Y) (Y,Z) 
(Z,X) (Z,Y) (Z,Z) 



(4) 



and the absolute value of the determinant 



x l y 1 z l 

x 2 y 2 z 2 
x 3 y 3 z 3 



(5) 



To see this, first note that the volume of the parallelepiped is given by the 
product of the area of the base times the height, where the area A of the base 
is given by equation (3) and the height IIUI is just the projection of Z onto the 
orthogonal complement in R 3 of the space spanned by X and Y. In other 
words, if W is the subspace of V = R 3 spanned by X and Y, then (by Theorem 
2.22) V = W x © W, and hence by Theorem 2. 12 we may write 



Z = U + aX + bY 



where U G W x and a, b G R are uniquely determined (the uniqueness of a and 
b actually follows from Theorem 2.3 together with Theorem 2.12). 
By definition we have (X, U) = (Y, U) = 0, and therefore 
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(X, Z) = dlXll 2 + b{X, Y) 
(Y, Z) = a(Y, X) + b\\Y\\ 2 
(U, Z) = \\U\\ 2 . 



(6) 



We now wish to solve the first two of these equations for a and b by Cramer's 
rule (Theorem 4.13). Note that the determinant of the matrix of coefficients is 
just equation (3), and hence is just the square of the area A of the base of the 
parallelepiped. Applying Cramer's rule we have 



aA 2 = 



bA 2 = 



(X,Z) (X,Y) 

(Y,Z) (Y, Y) 

(X,X) (X,Z) 

(Y,X) (Y,Z) 



(X,Y) (X,Z) 
(Y,Y) (Y,Z) 



Denoting the volume by Vol(X, Y, Z), we now have (using the last of equa- 
tions (6) together with U = Z - aX - bY) 

Vol 2 (X, Y, Z) = A 2 IIUII 2 = A 2 (U, Z) = A 2 ((Z, Z) - a(X, Z) - b(Y, Z» 

so that substituting the expressions for A 2 , aA 2 and bA 2 , we find 



Vol 2 (X, Y, Z) = (Z, Z) 



(X,X) (X,Y) 
(Y,X) (Y,Y) 



+ (X, Z) 



-(Y, Z) 



(X,Y) (X,Z) 
(Y,Y) (Y,Z) 
(X,X) (X,Z) 
(Y,X) (Y,Z) 



Using (X, Y) = (Y, X) etc., we see that this is just the expansion of a determi- 
nant by minors of the third row, and hence (using det A T = det A) 



Vol 2 (X, Y, Z) = 



(X, X) 


(Y, X) 


(Z, X) 








(X, Y) 


(Y, Y) 


(Z, Y) 








(X, Z) 


(Y, Z) 


(Z, Z) 








x l x 2 


x 3 


x 1 


y 1 z 1 




x l 


y 1 z l 


y 1 y 2 


y 3 


x 2 


y 2 z 


2 




x 2 


y 2 z 2 


z l z 2 


z 3 


x 3 


y 3 z 


3 




x 3 


y 3 z 3 
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We remark that if the collection {X, Y, Z} is linearly dependent, then the 
volume of the parallelepiped degenerates to zero (since at least one of the 
parallelograms that form the sides will have zero area). This agrees with the 
fact that the determinant (5) will vanish if two rows are linearly dependent. 
We also note that the area of the base is given by 

\XxY\ = 11X1111711 sin^(X, Y) 

where the direction of the vector X x Y is up (in this case). Therefore the pro- 
jection of Z in the direction of X x Y is just Z dotted into a unit vector in the 
direction of X x Y, and hence the volume of the parallelepiped is given by the 
number Z • (X x Y). This is the so-called scalar triple product that should be 
familiar from elementary courses. We leave it to the reader to show that the 
scalar triple product is given by the determinant (5) (see Exercise 11.6.1). 

Finally, note that if any two of the vectors X, Y, Z in equation (5) are 
interchanged, then the determinant changes sign even though the volume is 
unaffected (since it must be positive). This observation will form the basis for 
the concept of "orientation" to be defined later. 

Exercises 

1. Show that Z • (X x Y) is given by the determinant in equation (5). 

2. Find the area of the parallelogram whose vertices are: 

(a) (0,0), (1,3), (-2,1), and (-1,4). 

(b) (2, 4), (4, 5), (5, 2), and (7, 3). 

(c) (-1,3), (1,5), (3, 2), and (5, 4). 

(d) (0, 0, 0), (1, -2, 2), (3, 4, 2), and (4, 2, 4). 

(e) (2,2, 1), (3, 0,6), (4, 1,5), and (1, 1,2). 

3. Find the volume of the parallelepipeds whose adjacent edges are the 
vectors: 

(a) (1, 1,2), (3, -1,0), and 5, 2,-1). 

(b) (1, 1,0), (1,0, 1), and (0,1,1). 

4. Prove both algebraically and geometrically that the parallelogram with 
edges X and Y has the same area as the parallelogram with edges X and 
Y + aX for any scalar a. 
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5. Prove both algebraically and geometrically that the volume of the paral- 
lelepiped in R 3 with edges X, Y and Z is equal to the volume of the paral- 
lelepiped with edges X, Y and Z + aX + bY for any scalars a and b. 

6. Show that the parallelepiped in R 3 defined by the three vectors (2, 2, 1), 
(1,-2, 2) and (-2, 1, 2) is a cube. Find the volume of this cube. 

11.7 VOLUMES IN R n 

Now that we have a feeling for volumes in R 3 expressed as determinants, let 
us prove the analogous results in R n . To begin with, we note that parallelo- 
grams defined by the vectors X and Y in either R 2 or R 3 contain all points 
(i.e., vectors) of the form aX + bY for any a, b G [0, 1]. Similarly, given three 
linearly independent vectors X, Y, Z G R 3 , we may define the parallelepiped 
with these vectors as edges to be that subset of R 3 containing all vectors of the 
form aX + bY + cZ where < a, b, c < 1. The corners of the parallelepiped are 
the points SiX + 6 2 Y + 63Z where each Sj is either or 1. 

Generalizing these observations, given any r linearly independent vectors 
X 1; . . . , X r G R n , we define an r- dimensional parallelepiped as the set of all 
vectors of the form ajX, + • • • + a r X r where < a s < 1 for each i = 1, . . . , r. In 

R 3 , by a 1-volume we mean a length, a 2-volume means an area, and a 3- 
volume is just the usual volume. To define the volume of an r-dimensional 
parallelepiped we proceed by induction on r. In particular, if X is a nonzero 
vector (i.e., a 1-dimensional parallelepiped) in R n , we define its 1-volume to 

be its length (X, X} 1/2 . Proceeding, suppose the (r - l)-dimensional volume of 
an (r - l)-dimensional parallelepiped has been defined. If we let P r denote the 
r-dimensional parallelepiped defined by the r linearly independent vectors X l5 
. . . , X r , then we say that the base of P r is the (r - l)-dimensional paral- 
lelepiped defined by the r - 1 vectors X,, . . . , X r _i, and the height of P r is the 

length of the projection of X r onto the orthogonal complement in R n of the 
space spanned by X l5 . . . , X r _i. According to our induction hypothesis, the 
volume of an (r - l)-dimensional parallelepiped has already been defined. 
Therefore we define the r- volume of P r to be the product of its height times 
the (r - l)-dimensional volume of its base. 

The reader may wonder whether or not the r-volume of an r-dimensional 
parallelepiped in any way depends on which of the r vectors is singled out for 
projection. We proceed as if it does not and then, after the next theorem, we 
shall show that this is indeed the case. 
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Theorem 11.13 Let P r be the r-dimensional parallelepiped defined by the r 
linearly independent vectors X l5 . . . , X r G R n . Then the r-volume of P r is the 
positive square root of the determinant 



(X^XJ (X V X 2 ) ••• (X v X r ) 
[X 2 ,X,) (X 2 ,X 2 ) ••• (X 2 ,X r ) 

(X r , Xj) [X r , X 2 ) ■■• {X r , X r ) 



(7) 



Proof For the case of r = 1, we see that the theorem is true by the definition 
of length (or 1 -volume) of a vector. Proceeding by induction, we assume the 
theorem is true for an (r - l)-dimensional parallelepiped, and we show that it 
is also true for an r-dimensional parallelepiped. Hence, let us write 

(X^XJ (X,,X 2 ) ••• (X^X^) 
{X 2 , XJ {X 2 , X 2 ) ••■ (X 2 , X r _\i 

{X r _i, XJ (X r _ 1 , X 2 ) ••■ {X r _ l , X r _ ] ) 



A 2 =Vol 2 (P r _ 1 ) = 



for the volume of the (r - l)-dimensional base of P r . Just as we did in our 
discussion of volumes in IR , we write X r in terms of its projection U onto the 
orthogonal complement of the space spanned by the r - 1 vectors X,, . . . , X r . 
This means that we can write 

X r = U + a, X, + • • • + a r -iX r _i 

where (U, X) = for i = 1, . . . , r - 1, and (U, X r ) = (U, U). We thus have the 
system of equations 

a^XJ +a 2 (X 1 ,X 2 ) + -+a r _ 1 (X 1 ,X r _ l ) =(X 1 ,X r ) 
a^XJ +a 2 (X 2 ,X 2 ) +-+a r _ l (X 2 ,X r _ l ) =(X 2 ,X r ) 

a^X^, XJ + a^X^, X 2 )+ ■■■ + a^X^, X r _ l ) = {X^_ l , X r ) 



We write M l5 . . . , M r _i for the minors of the first r - 1 elements of the last 
row in (7). Solving the above system for the a s using Cramer's rule, we obtain 
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A\ =(-l) r " 2 M 1 
A 2 a 2 =(-l) r - 3 M 2 



where the factors of (-l) r_k_1 in A 2 a k result from moving the last column of 
(7) over to become the kth column of the kth minor matrix. 
Using this result, we now have 

A 2 U = A 2 {-a x X x - a 2 X 2 a r _ x X r _ x + X r ) 

= (-l)'"- 1 M 1 A r 1 + (-l) r - 2 M 2 X 2 +■■■ + {-\)M r _ x X r _ x + A 2 X r 

and hence, using IIUII 2 = (U, U) = (U, X r ), we find that (since (-l)" k = (-l) k ) 

A 2 \\U\\ 2 =A 2 (U, X r ) 

= (-l)'- 1 M 1 (Z r , X 1 ) + (-l)(-l)'"- 1 M 2 (Z r , X 2 ) 

+ --- + A 2 {X r , X r ) 
= (-l)'"- 1 [M 1 (X r , X 1 )-M 2 {X r , X 2 ) 

+-+(-iy- 1 A 2 (x r ,x r )] . 

Now note that the right hand side of this equation is precisely the expansion of 
(7) by minors of the last row, and the left hand side is by definition the square 
of the r-volume of the r-dimensional parallelepiped P r . This also shows that 
the determinant (7) is positive. I 

This result may also be expressed in terms of the matrix ((X ; , X,)) as 

Vol(P r ) = [detttX,,^))] 1 ' 2 . 

The most useful form of this theorem is given in the following corollary. 

Corollary The n- volume of the n-dimensional parallelepiped in R n defined 
by the vectors X b . . . , X n where each X s has coordinates (x\ , . . . , x n ;) is the 
absolute value of the determinant of the matrix X given by 
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Proof Note that (det X) 2 = (det X)(det X T ) = det XX T is just the determinant 
(7) in Theorem 11.13, which is the square of the volume. In other words, 
Vol(P n ) = |det X|. I 

Prior to this theorem, we asked whether or not the r-volume depended on 
which of the r vectors is singled out for projection. We can now easily show 
that it does not. Suppose that we have an r-dimensional parallelepiped defined 
by r linearly independent vectors, and let us label these vectors X,, . . . , X r . 
According to Theorem 11.13, we project X r onto the space orthogonal to the 
space spanned by X b . . . , X r _i, and this leads to the determinant (7). If we 
wish to project any other vector instead, then we may simply relabel these r 
vectors to put a different one into position r. In other words, we have made 
some permutation of the indices in (7). However, remember that any permuta- 
tion is a product of transpositions (Theorem 1.2), and hence we need only 
consider the effect of a single interchange of two indices. 

Notice, for example, that the indices 1 and r only occur in rows 1 and r as 
well as in columns 1 and r. And in general, indices i and j only occur in the ith 
and }th rows and columns. But we also see that the matrix corresponding to 
(7) is symmetric about the main diagonal in these indices, and hence an inter- 
change of the indices i and j has the effect of interchanging both rows i and j 
as well as columns i and j in exactly the same manner. Thus, because we have 
interchanged the same rows and columns there will be no sign change, and 
therefore the determinant (7) remains unchanged. In particular, it always 
remains positive. It now follows that the volume we have defined is indeed 
independent of which of the r vectors is singled out to be the height of the 
parallelepiped. 

Now note that according to the above corollary, we know that Vol(P n ) = 
Vol(X!, . . . , X n ) = |det X| which is always positive. While our discussion 
just showed that Vol(X!, . . . , X n ) is independent of any permutation of 
indices, the actual value of det X can change sign upon any such permutation. 
Because of this, we say that the vectors (X l5 . . . , X n ) are positively oriented 
if det X > 0, and negatively oriented if det X < 0. Thus the orientation of a 
set of vectors depends on the order in which they are written. To take into 
account the sign of det X, we define the oriented volume VolcCX^ . . . , X n ) 
to be +Vol(X„ . . . , X n ) if det X > 0, and -Vol(X„ . . . , X n ) if det X < 0. We 
will return to a careful discussion of orientation in a later section. We also 
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remark that det X is always nonzero as long as the vectors (X l5 . . . , X n ) are 
linearly independent. Thus the above corollary may be expressed in the form 

VoIoCXl . . . , X n ) = det (X„ . . . , X n ) 

where det(X b . . . , X n ) means the determinant as a function of the column 
vectors X ; . 

Exercises 

1. Find the 3-volume of the three-dimensional parallelepipeds in R 4 defined 
by the vectors: 

(a) (2, 1, 0, -1), (3, -1, 5, 2), and (0, 4, -1, 2). 

(b) (1, 1,0,0), (0,2, 2,0), and (0,0, 3, 3). 

2. Find the 2-volume of the parallelogram in R two of whose edges are the 
vectors (1, 3, -1, 6) and (-1, 2, 4, 3). 

3. Prove that if the vectors Xi, X2, . . . , X r are mutually orthogonal, the r- 
volume of the parallelepiped defined by them is equal to the product of 
their lengths. 

4. Prove that r vectors Xj, X2, . . . , X r in R n are linearly dependent if and 
only if the determinant (7) is equal to zero. 

11.8 LINEAR TRANSFORMATIONS AND VOLUMES 

One of the most useful applications of Theorem 11.13 and its corollary relates 
to linear mappings. In fact, this is the approach usually followed in deriving 
the change of variables formula for multiple integrals. Let {ej be an ortho- 
normal basis for R n , and let C n denote the unit cube in R n . In other words, 

C n = {tA + • • • + t n e n G R n : < t, < 1} . 

This is similar to the definition of P r given previously. 

Now let A: R n — > R n be a linear transformation. Then the matrix of A 
relative to the basis {ej} is defined by A(e s ) = e^. Let us write the image of e ; 
as Xi, so that X; = A(e ; ) = e^. This means that the column vector X; has 
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components (a \ , . . . , a"). Under the transformation A, the image of C n 
becomes 

A(C„) = A(2t,e,) = 2t,A(e,) = 2 t,X, 

(where < U < 1) which is just the parallelepiped P n spanned by the vectors 
(X l5 . . . , X n ). Therefore the volume of P n = A(C n ) is given by 

IdetfX, , . . . , X n )| = Idet (a>i)| . 

Recalling that the determinant of a linear transformation is defined to be the 
determinant of its matrix representation, we have proved the next result. 

Theorem 11.14 Let C n be the unit cube in R n spanned by the orthonormal 
basis vectors {ej}. If A: R n — * R n is a linear transformation and P n = A(C n ), 
then Vol(Pn) = Vol A(C n ) = |det A|. 

It is quite simple to generalize this result somewhat to include the image 
of an n-dimensional parallelepiped under a linear transformation A. First, we 
note that any parallelepiped P n is just the image of C n under some linear 
transformation B. Indeed, if P n = {tiX, + • • • + t n X n : < t s < 1} for some set 
of vectors X ; , then we may define the transformation B by B(ej) = X ; , and 
hence P n = B(C n ). Thus 

A(P n ) = A(B(C„)) = (A o B)(C„) 

and therefore (using Theorem 11.14 along with the fact that the matrix of the 
composition of two transformations is the matrix product) 

Vol A(P n ) = Vol[(A o B){C n )] = | det(A o B)\ = | det A| | det B\ 
= |detA|Vol(P„) . 

In other words, |det A| is a measure of how much the volume of the parallel- 
epiped changes under the linear transformation A. See the figure below for a 

picture of this in R . 

We summarize this discussion as a corollary to Theorem 11.14. 

Corollary Suppose P n is an n-dimensional parallelepiped in R n , and let 
A: R n -* R n be a linear transformation. Then Vol A(P n ) = |det A|Vol(P n ). 
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ei Xi = B( ei ) 

X 2 = B(e 2 ) 

Now that we have an intuitive grasp of these concepts, let us look at this 
material from the point of view of exterior algebra. This more sophisticated 
approach is of great use in the theory of integration. 

Let U and V be real vector spaces. Recall from Theorem 9.7 that given a 
linear transformation T G L(U, V), we defined the transpose mapping T* G 
L(V*, U*) by 

T*oo = (o o T 

for all oo G V*. By this we mean if u G U, then T*co(u) = oo(Tu). As we then 
saw in Theorem 9.8, if U and V are finite-dimensional and A is the matrix 
representation of T, then A T is the matrix representation of T*, and hence 
certain properties of T* follow naturally. For example, if T[ G L(V, W) and 
T 2 G L(U,V), then (T, ° T 2 )* = T 2 * ° T,* (Theorem 3.18), and if T is 
nonsingular, then (T -1 )* = (T*)" 1 (Corollary 4 of Theorem 3.21). 

Now suppose that {e^ is a basis for U and {£} is a basis for V. To keep 
the notation simple and understandable, let us write the corresponding dual 

bases as {e 1 } and {f J }. We define the matrix A = (a 1 ;) of T by Te s = f^. Then 
(just as in the proof of Theorem 9.8) 

(T*f)ej = /'(Tej) = f(f k a k j) = a k jf'(f k ) = a k jd' k = d } = a k d k j 

i k / \ 

= a k e (ej) 
which shows that 

T*f=a\e k . (8) 

We will use this result frequently below. 

We now generalize our definition of the transpose. If (|) G L(U, V) and T G 
% (V), we define the pull-back if G L(T r (V), % (U)) by 



(<|>*T)(u 1 ,...,u r ) = T^uO, • • • , <|)(u r )) 
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where u b . . . , u r E U. Note that in the particular case of r = 1, the mapping 
is just the transpose of (|). It should also be clear from the definition that (j)* is 
indeed a linear transformation, and hence 

<|>*(aT, + bT 2 ) = a^T, + b(|)*T 2 . 

We also emphasize that (|) need not be an isomorphism for us to define §*. 
The main properties of the pull-back are given in the next theorem. 

Theorem 11.15 If <|> E L(U, V) and ip E L(V, W), then 

(a) (ip §)* - <\>* ° ip*. 

(b) If I G L(U) is the identity map, then I* is the identity in L(T r (U)). 

(c) If (|) is an isomorphism, then so is (|)*, and ((j)*)" 1 = ((j) -1 )*- 

(d) If T, G T ri (V) and T 2 E T r2 (V), then 

<t>*(T,®T 2 ) = (<t>*T,) ® (<|>*T 2 ) . 

(e) Let U have basis {e 1; . . . , e m }, V have basis {f l5 . . . , f n } and suppose 
that (t>(e,) = fja j ,. If T E T r (V) has components T;, . . . ir = T(fi, , . . . , f ir ), then 
the components of §*T relative to the basis {ej are given by 

(**T)ji---i = Tir • iA • • • a ir jr . 

Proof (a) Note that xp ° U -» W, and hence (xp ° ()))*: T r (W) ^T r (U). 
Thus for any T£T r (W) and u b . . . , u r E U we have 

((T/;o0)*r)( Ml , ... ,u r ) = T(jp((l)(u l )), ... , ip((p(u r ))) 

= (yj*T)((t>(u l ),...,(t>(u r )) 
= (($* oy*)T)( Ul ,...,u r ) . 

(b) Obvious from the definition of I*. 

(c) If (|) is an isomorphism, then (j)" 1 exists and we have (using (a) and (b)) 

(j,* o ((j)"')* = ((j)"' o (|))* = p . 

Similarly ((j)" 1 )* ° <|>* = I*. Hence (())*) exists and is equal to ((j)" 1 )*- 

(d) This follows directly from the definitions (see Exercise 11.8.1). 

(e) Using the definitions, we have 
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W*T) h ... jr ={<t>*T){e h ,...,e jr ) 



-TWe h ),...,Me jr )) 

-rc/^,...,/^) 



= Tf , a h , • • • a' v , . 

'l *r ./l 



Alternatively, if {e'}and {f J } are the bases dual to {ej and {f,} respec- 
tively, then T = Ti,. . -i r e 11 ® • • • ® e lr and consequently (using the linearity of 
(|)*, part (d) and equation (8)), 




Jr 



which therefore yields the same result. I 

For our present purposes, we will only need to consider the pull-back as 
defined on the space A r (V) rather than on T r (V). Therefore, if $ E L(U, V) 
then (|)* E L(T r (V), T r (U)), and hence we see that for co E A r (V) we have 
(^♦caXUi, . . . , u r ) = toOKu,), • • • , <Ku r )). This shows that (|)*(A r (V)) C A r (U). 
Parts (d) and (e) of Theorem 11.15 applied to the space A r (V) yield the fol- 
lowing special cases. (Recall that |ii • • • i r | means the sum is over increasing 
indices ij < • • • < i r .) 

Theorem 11.16 Suppose <|> E L(U, V), a E A r (V) and (3 E A S (V). Then 

(a) <|)*(aAp) = (<|)*a)A(<|)*P). 

(b) Let U and V have bases {ej} and {fj} respectively, and let U* and V* 
have bases {e 1 } and {f 1 }. If we write (j)(ej) = f^ and ) = a^e- 1 , and if a = 
a| i| ... ir |f i 'A...Af i '£A r (V),then 



<|)*a = a| kl ... kr |e kl A • • • Ae 



where 



-%-i r \ e k 1 -k r al 



■•■a 



Jr 



Thus we may write 



aki - k r = a|i,... ir |det(a K ) 



where 
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det(a 7 ^) = 



a 1 



Proof (a) For simplicity, let us write ((|)*a)(uj) = a((|)(uj)) instead of 
((|)*a)(Ui, . . . , u r ) = a((|)(u 1 ), . . . , (|)(u r )) (see the discussion following 
Theorem 1 1.4). Then, in an obvious notation, we have 

[^(aA/3)]( M/ ) = («A)8)(0( M/ )) 

= 2^ef (a(0( M/ ))/3(0(« z )) 
= 2^ef (0*aX« / )(0*/3)(« Js: ) 
-[(0*a)A(0*/8)](«/) . 

By induction, this also obviously applies to the wedge product of a finite 
number of forms. 

(b) From a = a| . .y f 1] a • • • Af lr and (|)*(f ') = a'j e J , we have (using part 
(a) and the linearity of (|)*) 



^a = a, i ... ! .,W)A---AWO 

-%-irP h h ■■■ a ' r j r eJl A '" AeJr 



But 



e J1 a ••• a e 



a ••• a e 



and hence we have 



where 



- a \-i r \ E k x -k/ 1 h '" ah ' h 



Finally, from the definition of determinant we see that 



e/ 1 " J , r a h 



a r : - 
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Example 11.11 (This is a continuation of Example 11.1.) An important 
example of §*a is related to the change of variables formula in multiple 
integrals. While we are not in any position to present this material in detail, 
the idea is this. Suppose we consider the spaces U = R 3 (u, v, w) and V = 
R 3 (x, y, z) where the letters in parentheses tell us the coordinate system used 
for that particular copy of R . Note that if we write (x, y, z) = (x , x , x ) and 
(u, v, w) = (u 1 , u 2 , u 3 ), then from elementary calculus we know that dx 1 = 
(dxVdu j )du j and d/du 1 = (dx-i/du^d/dx-"). 

Now recall from Example 11.1 that at each point of R 3 (u, v, w), the tan- 
gent space has the basis {ej = {d/du 1 } and the cotangent space has the corre- 
sponding dual basis {e 1 } = {du 1 }, with a similar result for R 3 (x, y, z). Let us 
define (|): R 3 (u, v, w) -» R 3 (x, y, z) by 

(Kd/du 1 ) = (dxW)(d/dxj) = a j i(d/dxj) . 
It is then apparent that (see equation (8)) 

())*(dx i ) = a^duJ = (dxVduj)du j 

as we should have expected. We now apply this to the 3-form 

a = ai23 dx ! Adx 2 Adx 3 = dxAdyAdz G A 3 (V) . 

Since we are dealing with a 3-form in a 3-dimensional space, we must 
have 

§*a = aduAdvAdw 
where a = &123 consists of the single term given by the determinant 



a l a 2 a 3 



a l a 2 a 3 



dx 1 /du 1 dx l ldu 2 dx^du 3 
dx 2 /du l dx 2 /du 2 dx 2 /du 3 
dx 3 /du l dx 3 /du 2 dx 3 /du 3 



which the reader may recognize as the so-called Jacobian of the transforma- 
tion. This determinant is usually written as d(x, y, z)/3(u, v, w), and hence we 
see that 

d(x y z) 

(j)*(dx a dy a dz) — — - — duhdvhdw . 

d(u, v, w) 
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This is precisely how volume elements transform (at least locally), and 
hence we have formulated the change of variables formula in quite general 
terms. / 

This formalism allows us to define the determinant of a linear transforma- 
tion in an interesting abstract manner. To see this, suppose (|) G L(V) where 

dim V = n. Since dim A n (V) = 1, we may choose any nonzero co eA n (V) as 
a basis. Then §*: A n (V) -* A n (V) is linear, and hence for any oo = c (o G 
A n (V) we have 

= <|)*(c (Oo) = C (|)*0L) = c coo = c(c oo ) = CO) 

for some scalar c (since (|)*(D G A n (V) is necessarily of the form coa ). Noting 
that this result did not depend on the scalar c and hence is independent of 00 = 
c oo , we see that the scalar c must be unique. We therefore define the deter- 
minant of (|) to be the unique scalar, denoted by det (|), such that 

(j)*oo = (det(|))(o . 

It is important to realize that this definition of the determinant does not 
depend on any choice of basis for V. However, let {ej be a basis for V, and 
define the matrix (a*) of (|) by fyfe) = e^. Then for any nonzero 00 G A n (V) 
we have 

(<|>*a>)(ei, . . . , e„) = (det $)(o(e u . . . , e„) • 

On the other hand, Example 1 1.2 shows us that 

((p*(D){e x , ... , e n ) = (o((p(e l ), ... , 0(e„)) 

= a l \ ■•■a'\(o(e h ,... ,e if ) 
= (det(a' j ))(o(e l , ...,e n ) . 

Since 00 ^ 0, we have therefore proved the next result. 

Theorem 11.17 If V has basis {e 1; . . . , e n } and § G L(V) has the matrix 
representation (a^) defined by (j)(ei) = eja J i, then det (|) = det(aj). 

In other words, our abstract definition of the determinant is exactly the 
same as our earlier classical definition. In fact, it is now easy to derive some 
of the properties of the determinant that were not exactly simple to prove in 
the more traditional manner. 
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Theorem 11.18 If V is finite-dimensional and (|), ip G L(V, V), then 

(a) det((J) ° ip) = (det <j>)(det ip). 

(b) If § is the identity transformation, then det §= I. 

(c) (|) is an isomorphism if and only if det (|) ^ 0, and if this is the case, then 
det (t> _1 =(det(|))- 1 . 

Proof (a) By definition we have ((|) ° ip)*o) = det(cf> ° ip)o). On the other hand, 
by Theorem 11.15(a) we know that ((|) ° ip)* = ip* ° (|)*, and hence 

((/)oijj)*a) = ii>*((j)*(A)) = ip*[(det (j))co] = (det0)^*co 
= (det0)(detv)<o • 

(b) If (j) = 1 then = 1 also (by Theorem 1 1.15(b)), and thus oo = = 
(det 4>)<x> implies det 4> = 1. 

(c) First assume that § is an isomorphism so that (j)" 1 exists. Then by parts 
(a) and (b) we see that 

1 = det^" 1 ) = (det()))(det(|)- 1 ) 

which implies det (|) ^ and det (j)" 1 = (det (j))" 1 . Conversely, suppose that (|) is 
not an isomorphism. Then Ker (|) ^ and there exists a nonzero ej G V such 
that (^(eO = 0. By Theorem 2.10, we can extend this to a basis {e^ . . . , e n } for 

V. But then for any nonzero w G A n (V) we have 

(det0)<w(e 1 , ... ,e n ) = ((p*(o)(e 1 , ... ,e n ) 
= a)((j)(e l ), ... ,0(e„)) 
= co(0, 0(e 2 ), ... , (j)(e n )) 
= 

and hence we must have det (|) = 0. I 
Exercises 

1. Prove Theorem 11.15(d). 

2. Show that the matrix (|)*T defined in Theorem 11.15(e) is just the r-fold 
Kronecker product A • • • A where A = (a'j). 
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The next three exercises are related. 

3. Let § G L(U, V) be an isomorphism, and suppose T G T r s (U). Define the 
push-forward ty* G LCr r s (U), T r s (V)) by 

(^(a 1 , . . . , a s , ui, . . . , u r ) = TO^a 1 , . . . , §*a\ §~ l u h . . . , <$>-%) 

where a 1 , . . . , a s G U* and ui, . . . , u r G U. If ip G L(V, W) is also an iso- 
morphism, prove the following: 

(a) (ip o (j))^ = xp^ o (|^. 

(b) If I G L(U) is the identity map, then so is I* G L(T r s (U)). 

(c) (j)^ is an isomorphism, and ((j)*) -1 = ((j) -1 )*. 

(d) If Ti G T rj *» (U) and T 2 G T r ; V2 (U) , then 

**(Ti®T 2 ) = (^Ti)®^^) • 

4. Let (|) G L(U, V) be an isomorphism, and let U and V have bases {e;} and 

{fi} respectively. Define the matrices (a 1 )) and (Vj) by cp(ei) = fja J i and 

cp _1 (fi) = ejb j i. Suppose T G T r s (U) has components T 11 " ' h j, . . . j r relative 

to {e;}, and S G T r s (V) has components S 11 ' ' ' ls j, . . . j r relative to {fi}. 
Show that the components of (j^T and (j^S are given by 

^■■■^■^-^■■■ fl V pl '''%"-^%-"^ 
(^■■■^■■■^-^■■•*V /,1 "" p ' ft -"^"- fl % • 

5. Let {co 1 } be the basis dual to {e;} for R 2 . Let 

T = 2ei ® w 1 - e 2 ® w 1 + 3ei ® w 2 
and suppose (|) G L(R 2 ) and ip G L(R 3 , R 2 ) have the matrix representations 
( 2 l\ /0 1 -l\ 



= 



v-i 1/ 



and ip = 



1 2, 



Compute Tr T, (|)*T, xp*T, Tr(xp*T), and (j^T. 
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Suppose dim V = n and consider the space A n (V). Since this space is 1- 
dimensional, we consider the n-form 

co = e ! A • • • Ae n G A n (V) 

where the basis {e 1 } for V* is dual to the basis {e ; } for V. If {v s = ejV J i} is any 
set of n linearly independent vectors in V then, according to Examples 11.2 
and 11.8, we have 

co(v!, . . . , v„) = det(v j i)a>(ei, . . . , e n ) = det(v j . 

However, from the corollary to Theorem 11.13, this is just the oriented n- 
volume of the n-dimensional parallelepiped in R n spanned by the vectors {Vj}. 
Therefore, we see that an n-form in some sense represents volumes in an n- 
dimensional space. We now proceed to make this definition precise, beginning 
with a careful definition of the notion of orientation on a vector space. 

In order to try and make the basic idea clear, let us first consider the space 

R with all possible orthogonal coordinate systems. For example, we may 
consider the usual "right-handed" coordinate system {e b e 2 } shown below, or 
we may consider the alternative "left-handed" system {e\, e' 2 } also shown. 



e'i 




ei 

In the first case, we see that rotating e[ into e 2 through the smallest angle 
between them involves a counterclockwise rotation, while in the second case, 
rotating e\ into e' 2 entails a clockwise rotation. This effect is shown in the 
elementary vector cross product, where the direction of e[ x e 2 is defined by 
the "right-hand rule" to point out of the page, while e\ x e' 2 points into the 
page. 

We now ask whether or not it is possible to continuously rotate q\ into e, 
and e' 2 into e 2 while maintaining a basis at all times. In other words, we ask if 
these two bases are in some sense equivalent. Without being rigorous, it 
should be clear that this can not be done because there will always be one 
point where the vectors q\ and e' 2 will be co-linear, and hence linearly 
dependent. This observation suggests that we consider the determinant of the 
matrix representing this change of basis. 
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In order to formulate this idea precisely, let us take a look at the matrix 
relating our two bases {ej and {e'i} for R 2 . We thus write e'j = e^ and 
investigate the determinant det(a 1 j ). From the above figure, we see that 

e\ = e l a l + e 2 a 2 l where a\ < and a 2 l > 
e' 2 = e x c^ 2 + e 2 a2 2 where a l 2 < and a 2 2 > 

and hence de^a^) = a\ a 2 2 - a 1 2 a 2 t < 0. 

Now suppose that we view this transformation as a continuous modifica- 
tion of the identity transformation. This means we consider the basis vectors 
e'j to be continuous functions e'j(t) of the matrix a J j(t) for < t < 1 where a J j(0) 
= 6 J j and a J j(l) = a\, so that e'j(0) = e ; and e'j(l) = e'j. In other words, we write 
e'i(t) = ej^iCt) for < t < 1. Now note that detCa^O)) = det^) = 1 > 0, while 
deuVjO)) = det(a 1 J ) < 0. Therefore, since the determinant is a continuous 

function of its entries, there must be some value t G (0, 1) where de^a^to)) = 
0. It then follows that the vectors e' ; (t ) will be linearly dependent. 

What we have just shown is that if we start with any pair of linearly inde- 
pendent vectors, and then transform this pair into another pair of linearly 
independent vectors by moving along any continuous path of linear transfor- 
mations that always maintains the linear independence of the pair, then every 
linear transformation along this path must have positive determinant. Another 
way of saying this is that if we have two bases that are related by a transfor- 
mation with negative determinant, then it is impossible to continuously trans- 
form one into the other while maintaining their independence. This argument 
clearly applies to R n and is not restricted to R 2 . 

Conversely, suppose we had assumed that e'j = e^, but this time with 
det(a 1 j ) > 0. We want to show that {ej may be continuously transformed into 
{e'j} while maintaining linear independence all the way. We first assume that 
both {ej} and {e'j} are orthonormal bases. After treating this special case, we 
will show how to take care of arbitrary bases. 

(Unfortunately, the argument we are about to give relies on the topological 
concept of path connectedness. Since a complete discussion of this topic 
would lead us much too far astray, we shall be content to present only the 
fundamental concepts in Appendix C. Besides, this discussion is only motiva- 
tion, and the reader should not get too bogged down in the details of this 
argument. Those readers who know some topology should have no trouble 
filling in the necessary details if desired.) 

Since {ej} and {e'j} are orthonormal, it follows from Theorem 10.6 
(applied to R rather than C) that the transformation matrix A = (a'j) defined by 
e'j = e^ must be orthogonal, and hence det A = +1 (by Theorem 10.8(a) and 
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the fact that we are assuming {e s } and {e'i} are related by a transformation 
with positive determinant). By Theorem 10.19, there exists a nonsingular 
matrix S such that S" 1 AS = Me where Me is the block diagonal canonical form 
consisting of +1 's, -1 's, and 2x2 rotation matrices R(00 given by 

(cos 6) -sind:\ 

R(di) = 



It is important to realize that if there are more than two +l's or more than 
two -l's, then each pair may be combined into one of the R(0O by choosing 
either 0; = Jt (for each pair of -l's) or ; = (for each pair of +l's). In this 
manner, we view Me as consisting entirely of 2 x 2 rotation matrices, and at 
most a single +1 and/or -1. Since det R(80 = +1 for any ; , we see that (using 
Theorem 4.14) det Me = +1 if there is no -1, and det Me = -1 if there is a 
single -1. From A = SMeS" 1 , we see that det A = det Me, and since we are 
requiring that det A > 0, we must have the case where there is no -1 in Me. 

Since cos 0j and sin S are continuous functions of S G [0, 2n) (where the 
interval [0, 2jt) is a path connected set), we note that by parametrizing each ; 
by 0j(t) = (1 - t)0j, the matrix Me may be continuously connected to the 
identity matrix I (i.e., at t = 1). In other words, we consider the matrix Me(t) 
where Me(o) = Me and Me(i) = I. Hence every such Me (i.e., any matrix of the 
same form as our particular Me, but with a different set of Oj's) may be 
continuously connected to the identity matrix. (For those readers who know 
some topology, note all we have said is that the torus [0, 2jt) x • • • x [0, 2jt) is 
path connected, and hence so is its continuous image which is the set of all 
such Me.) 

We may write the (infinite) collection of all such Me as M = {Me}. 
Clearly M is a path connected set. Since A = SMeS" 1 and I = SIS" 1 , we see 
that both A and I are contained in the collection SMS" 1 = {SMeS -1 }. But 
SMS" 1 is also path connected since it is just the continuous image of a path 
connected set (matrix multiplication is obviously continuous). Thus we have 
shown that both A and I lie in the path connected set SMS" 1 , and hence A may 
be continuously connected to I. Note also that every transformation along this 
path has positive determinant since det SMeS" 1 = det Me = 1 > for every 
M e GM. 

If we now take any path in SMS" 1 that starts at I and goes to A, then 
applying this path to the basis {e s } we obtain a continuous transformation 
from {e s } to {e'i} with everywhere positive determinant. This completes the 
proof for the special case of orthonormal bases. 

Now suppose that {v ; } and {v'j} are arbitrary bases related by a transfor- 
mation with positive determinant. Starting with the basis {v ; }, we first apply 
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the Gram-Schmidt process (Theorem 2.21) to {Vj} to obtain an orthonormal 
basis {ej} = {Vjb J i}. This orthonormalization process may be visualized as a 
sequence Vj(t) = Vjb J i(t) (for < t < 1) of continuous scaling s and rotations that 
always maintain linear independence such that Vj(0) = Vj (i.e., b J i(0) = S J j) and 
Vj(l) = e s (i.e., b J i(l) = b J j). Hence we have a continuous transformation b J j(t) 
taking {v ; } into {ej} with det(b J j(t)) > (the transformation starts with 
det(b J j(0)) = det I > 0, and since the vectors are always independent, it must 
maintain det((b J j(t)) ^ 0). Similarly, we may transform {v'j} into an orthonor- 
mal basis {e'j} by a continuous transformation with positive determinant. 
(Alternatively, it was shown in Exercise 5.4.14 that the Gram-Schmidt process 
is represented by an upper-triangular matrix with all positive diagonal ele- 
ments, and hence its determinant is positive.) Now {ej} and {e'j} are related 
by an orthogonal transformation that must also have determinant equal to +1 
because {v ; } and {v'j} are related by a transformation with positive determi- 
nant, and both of the Gram-Schmidt transformations have positive determi- 
nant. This reduces the general case to the special case treated above. 

With this discussion as motivation, we make the following definition. Let 
{v 1; . . . , v n } and {v' 1; . . . , v' n } be two ordered bases for a real vector space 

V, and assume that v'j = v^. These two bases are said to be similarly 

oriented if detCa*) > 0, and we write this as {v s } = {v'j}. In other words, 
{vj = {v'j} if v'j = (j)(Vi) with det (|) > 0. We leave it to the reader to show that 
this defines an equivalence relation on the set of all ordered bases for V (see 
Exercise 1 1.9.1). We denote the equivalence class of the basis {Vj} by [v ; ]. 

It is worth pointing out that had we instead required deuVj) < 0, then this 
would not have defined an equivalence relation. This is because if (b^) is 
another such transformation with deuVj) < 0, then 

deuy^O = deKa^detO^) > . 

Intuitively this is quite reasonable since a combination of two reflections 
(each of which has negative determinant) is not another reflection. 

We now define an orientation of V to be an equivalence class of ordered 
bases. The space V together with an orientation [v s ] is called an oriented 
vector space (V, [v s ]). Since the determinant of a linear transformation that 
relates any two bases must be either positive or negative, we see that V has 
exactly two orientations. In particular, if {Vj} is any given basis, then every 
other basis belonging to the equivalence class [vj of {v ; } will be related to 
{Vj} by a transformation with positive determinant, while those bases related 
to {vj by a transformation with negative determinant will be related to each 
other by a transformation with positive determinant (see Exercise 1 1.9.1). 
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Now recall we have seen that n-forms seem to be related to n- volumes in 
an n-dimensional space V. To precisely define this relationship, we formulate 
orientations in terms of n-forms. To begin with, the nonzero elements of the 1- 
dimensional space A n (V) are called volume forms (or sometimes volume 
elements) on V. If <»! and co 2 are volume forms, then oh is said to be equiv- 
alent to (o 2 if (0, = cco 2 for some real c > 0, and in this case we also write co, ~ 
oa 2 . Since every element of A n (V) is related to every other element by a rela- 
tionship of the form a)! = aco 2 for some real a (i.e., -oo < a < oo), it is clear that 
this equivalence relation divides the set of all nonzero volume forms into two 
distinct groups (i.e., equivalence classes). We can relate any ordered basis 
{vj for V to a specific volume form by defining 

(O = v ! a • • • AV n 

where {v 1 } is the basis dual to {Vj}. That this association is meaningful is 
shown in the next result. 

Theorem 11.19 Let {Vj} and {Vj} be bases for V, and let {v 1 } and {v 1 } be 
the corresponding dual bases. Define the volume forms 

(D = v ! a • • • av" 

and 

ol = v 1 a ■ ■ ■ av" . 
Then {Vj} = {Vj} if and only if oo = oo. 

Proof First suppose that {v s } « {Vj}. Then v s = (^(Vj) where det ()) > 0, and 
hence (using 

(0(Y l , . . . , V n ) = V ! A • • • AV n (V! , . . . , v n ) = 1 

as shown in Example 11.8) 

o)(v x , ... ,v n ) = co((p(v l ), ... ,0(v„)) 
= ((j)*(0)(v 1 , ... ,v n ) 

= (det<^))(u(v 1 , ... ,vj 
= det^> . 

If we assume that oo = col for some -oo < c < oo, then using ©(v! , . . . , v n ) = 1 
we see that our result implies c = det (|) > and thus co = w. 
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Conversely, if oo = c oo where c > 0, then assuming that Vj = (j)(Vi), the 
above calculation shows that det § = 0, and hence {Vj} = { Vj}. I 

What this theorem shows us is that an equivalence class of bases uniquely 
determines an equivalence class of volume forms and conversely. Therefore it 
is consistent with our earlier definitions to say that an equivalence class [co] of 
volume forms on V defines an orientation on V, and the space V together 
with an orientation [00] is called an oriented vector space (V, [co]). A basis 
{Vj} for (V, [00]) is now said to be positively oriented if (X)(v u . . . , v n ) > 0. 
Not surprisingly, the equivalence class [-00] is called the reverse orientation, 
and the basis {vj is said to be negatively oriented if a)(v u . . . , v n ) < 0. Note 
that if the ordered basis {v 1; v 2 , . . . , v n } is negatively oriented, then the basis 
{v 2 , Vi , . . . , v n } will be positively oriented because oo(v 2 , v x , . . . , v n ) = 
- co(v! , v 2 , . . . , v n ) > 0. By way of additional terminology, the standard 
orientation on R n is that orientation defined by either the standard ordered 
basis {e^ . . . , e n }, or the corresponding volume form e J A • • • Ae n . 

In order to proceed any further, we must introduce the notion of a metric 
on V. This is the subject of the next section. 

Exercises 

1. (a) Show that the collection of all similarly oriented bases for V defines 
an equivalence relation on the set of all ordered bases for V. 

(b) Let {vj be a basis for V. Show that all other bases related to {v ; } by a 
transformation with negative determinant will be related to each other by a 
transformation with positive determinant. 

2. Let (U, 00) and (V, li) be oriented vector spaces with chosen volume ele- 
ments. We say that (j) G L(U, V) is volume preserving if = 00. If 
dim U = dim V is finite, show that (|) is an isomorphism. 

3. Let (U, [00]) and (V, [li]) be oriented vector spaces. We say that § G 
L(U, V) is orientation preserving if G [00]. If dim U = dim V is 
finite, show that (|) is an isomorphism. If U = V = R 3 , give an example of a 
linear transformation that is orientation preserving but not volume 
preserving. 
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We now generalize slightly our definition of inner products on V. In particu- 
lar, recall from Section 2.4 (and the beginning of Section 9.2) that property 
(IP3) of an inner product requires that (u, u) > for all u G V and (u, u) = if 
and only if u = 0. If we drop this condition entirely, then we obtain an 
indefinite inner product on V. (In fact, some authors define an inner product 
as obeying only (IP1) and (IP2), and then refer to what we have called an 
inner product as a "positive definite inner product.") If we replace (IP3) by the 
weaker requirement 

(IP3') (u, v) = for all v G V if and only if u = 

then our inner product is said to be nondegenerate. (Note that every example 
of an inner product given in this book up to now has been nondegenerate.) 
Thus a real nondegenerate indefinite inner product is just a real nondegenerate 
symmetric bilinear map. We will soon see an example of an inner product 
with the property that (u, u) = for some u ^ (see Example 11.13 below). 

Throughout the remainder of this chapter, we will assume that our inner 
products are indefinite and nondegenerate unless otherwise noted. We further- 
more assume that we are dealing exclusively with real vector spaces. 

Let {ej be a basis for an inner product space V. Since in general we will 
not have (e s , e,-) = Sjj, we define the scalars g ;j by 

gij = (ei, ej ) . 

In terms of the gij, we have for any X,Y£V 

(X,Y) = (xWyJej) = x^e^) = gyx^ • 

If {ei} is another basis for V, then we will have Bj = e^ for some nonsingular 
transition matrix A = (a J j). Hence, writing g^ = (e ; , Bj) we see that 

gij = (e„ej) = (e r a r i, e s a s j) = a r ia s j(e r , e s ) = a r ia s jg rs 

which shows that the transform like the components of a second-rank 
covariant tensor. Indeed, defining the tensor g G T2 (V) by 



results in 



g(X, Y) = (X, Y) 
g(e„ = (e„ ej) = g y 



610 



MULTILINEAR MAPPINGS AND TENSORS 



as it should. We are therefore justified in defining the (covariant) metric 
tensor 

g = gijco^wj er 2 (V) 

(where {00 1 } is the basis dual to {ej}) by g(X, Y) = (X, Y). In fact, since the 
inner product is nondegenerate and symmetric (i.e., (X, Y) = (Y, X}), we see 
that g is a nondegenerate symmetric tensor (i.e., g ;j = gjj). 

Next, we notice that given any vector AGV,we may define a linear func- 
tional (A, } on V by the assignment B >-» (A, B). In other words, for any A G 
V, we associate the 1-form a defined by a(B) = (A, B) for every B G V. Note 
that the kernel of the mapping A >-» (A, ) (which is easily seen to be a vector 
space homomorphism) consists of only the zero vector (since (A, B) = for 
every B G V implies that A = 0), and hence this association is an iso- 
morphism. Given any basis {ej} for V, the components a s of a G V* are given 

in terms of those of A = a 1 ^ G V by 

ai = a(e ; ) = (A, ej) = (a^, ej) = a j (ej, ej) = a^ 

Thus, to any contravariant vector A = a'ej G V, we can associate a unique 
covariant vector a G V* by 

a = ajW 1 = (a-igjOo) 1 

where {00 1 } is the basis for V* dual to the basis {ej} for V. In other words, we 
write 

a, = aJgj, 

and we say that a s arises by lowering the index j of a 1 . 

Example 11.12 If we consider the space R n with a Cartesian coordinate sys- 
tem {ej}, then we have g ;j = (e ; , ej ) = Sjj, and hence a ; = S^a 1 = a 1 . Therefore, 
in a Cartesian coordinate system, there is no distinction between the compo- 
nents of covariant and contravariant vectors. This explains why 1 -forms never 
arise in elementary treatments of vector analysis. / 

Since the metric tensor is nondegenerate, the matrix (gjj) must be nonsin- 
gular (or else the mapping a J >-» a ; would not be an isomorphism). We can 
therefore define the inverse matrix (g 1J ) by 



g 1J gjk = g kj g J1 = o\ 



11.10 THE METRIC TENSOR AND VOLUME FORMS 



611 



Using (g 1J ), we see that the inverse of the mapping a J a s is given by 

g lj aj = a 1 . 

This is called, naturally enough, raising an index. We will show below that 
the g 1J do indeed form the components of a tensor. 

It is worth remarking that the "tensor" g'j = g lk g k j = 8* (= 8/) is unique in 
that it has the same components in any coordinate system. Indeed, if {e^ and 
{Bi} are two bases for a space V with corresponding dual bases {co 1 } and {w 1 }, 
then gj = eja J i and w J = b^o) 1 = (a -1 )^ 1 (see the discussion following Theorem 
11.2). Therefore, if we define the tensor 8 to have the same values in the first 
coordinate system as the Kronecker delta, then S'j = 8(a) 1 , ej). If we now define 
the symbol 5* by 6*j = 6(a) 1 , Bj), then we see that 

d l j= 8{a>\ = 6({a l i k a, k , e r a r j) = (a' 1 ) 1 k a r jd(co k , e r ) 
= (a ) k a jd r =(a ) k a j=d j . 

This shows that the b) are in fact the components of a tensor, and that these 
components are the same in any coordinate system. 

We would now like to show that the scalars g 1J are indeed the components 
of a tensor. There are several ways that this can be done. First, let us write 

gijgj k = 8 k i where we know that both and 6 k i are tensors. Multiplying both 

sides of this equation by (a"') r k a 1 s and using (a" 1 ) r k a 1 s6 k i = 6 r s we find 

g^a- 1 )^ = 6 r s . 

Now substitute gij = g^* = gi t a t q (a" 1 ) q j to obtain 

[aWqgidKa- 1 )^" 1 )^] = S r s • 

Since gj t is a tensor, we know that a^a'q g; t = g sq . If we write 

gi r = (a- 1 )^- 1 )'^ 

then we will have defined the g^ k to transform as the components of a tensor, 
and furthermore, they have the requisite property that g sq g qr = 6 r s . Therefore 
we have defined the (contra variant) metric tensor G G Tq 2 (V) by 
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G = g ij e, ® ej 

where g lj g jk = b\. 

There is another interesting way for us to define the tensor G. We have 
already seen that a vector A = a'ej G V defines a unique linear form a = 
ajCoJ G V* by the association a = gi J a 1 (D J . If we denote the inverse of the matrix 
(&j) by (g 1J ) so that g 1J g jk = 6\, then to any linear form a = a^ 1 G V* there 
corresponds a unique vector A = a 1 ej G V defined by A = g IJ a ; ej. We can now 
use this isomorphism to define an inner product on V*. In other words, if ( , ) 
is an inner product on V, we define an inner product ( , ) on V* by 

(a, P) = (A, B) 

where A, B G V are the vectors corresponding to the 1 -forms a, (3 G V*. 

Let us write an arbitrary basis vector e s in terms of its components relative 

to the basis {ej} as e s = S^e,-. Therefore, in the above isomorphism, we may 
define the linear form e s G V* corresponding to the basis vector e; by 

hi = gj k 5 J i co k = g ik w k 

and hence using the inverse matrix, we find that 

oa k = g ki e, . 

Applying our definition of the inner product in V* we have (e s , e^ = (e ; , e^ = 
gij, and therefore we obtain 

(a>\ a>>) = (g ir e r , g Js e s ) = g ir g JS (e r , e s ) = gVft, = g ir 8\ = g ij 

which is the analogue in V* of the definition g ;j = (e s , ej) in V. 
Lastly, since w J = (a -1 )^ 1 , we see that 

g ij = {a>\ co j ) = {(a _1 )> r , (a-y.co 5 ) = {a l i r {a l ) j s {o) r , co s ) 
= (a- 1 ) i r (a-y s 8 rs 

so the scalars g 1J may be considered to be the components of a symmetric 
tensor G G 12°(V) defined as above by G = g 1 -^ ® ej. 

Now let g = ( , } be an arbitrary (i.e., possibly degenerate) real symmetric 
bilinear form on the inner product space V. It follows from the corollary to 
Theorem 9.14 that there exists a basis {e ; } for V in which the matrix (g^) of g 
takes the unique diagonal form 
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I J 



where r + s + t = dim V = n. Thus 



1 for 1 < i ss r 
- 1 for r + 1 < i < r + s 
for r + s + 1 < i <, n 



If r + s < n, the inner product is degenerate and we say that the space V is 
singular (with respect to the given inner product). If r + s = n, then the inner 
product is nondegenerate, and the basis {d} is orthonormal. In the orthonor- 
mal case, if either r = or r = n, the space is said to be ordinary Euclidean, 
and if < r < n, then the space is called pseudo- Euclidean. Recall that the 
number r - s = r - (n - r) = 2r - n is called the signature of g (which is 
therefore just the trace of (gij)). Moreover, the number of -l's is called the 
index of g, and is denoted by Ind(g). If g = ( , } is to be a metric on V, then by 
definition, we must have r + s = n so that the inner product is nondegenerate. 
In this case, the basis {ej} is called g- orthonormal. 

Example 11.13 If the metric g represents a positive definite inner product on 
V, then we must have Ind(g) = 0, and such a metric is said to be Riemannian. 
Alternatively, another well-known metric is the Lorentz metric used in the 
theory of special relativity. By definition, a Lorentz metric r| has Ind(r|) = 1. 
Therefore, if r| is a Lorentz metric, an r| -orthonormal basis {e l5 . . . , e n } 
ordered in such a way that Yi(e ; , ej) = +1 for i = 1, . . . , n - 1 and r|(e n , e n ) = 
-1 is called a Lorentz frame. 

Thus, in terms of a g-orthonormal basis, a Riemannian metric has the form 



(&/)= : 



'\ ••• s 
1 ••• 







1 



while in a Lorentz frame, a Lorentz metric takes the form 



'I ••• s 
1 ••• 







-1 
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It is worth remarking that a Lorentz metric is also frequently defined as 
having Ind(r|) = n - 1. In this case we have r^e^ eO = 1 and r|(ei, eO = -1 for 
each i = 2, . . . , n. We also point out that a vector v G V is called timelike if 
r|(v, v) < 0, lightlike (or null) if r)(v, v) = 0, and spacelike if r|(v, v) > 0. Note 
that a Lorentz inner product is clearly indefinite since, for example, the 
nonzero vector v with components v = (0, 0, 1, 1) has the property that (v, v) = 

T1(V,V) = 0. / 

We now show that introducing a metric on V leads to a unique volume 
form on V. 

Theorem 11.20 Let g be a metric on an n-dimensional oriented vector space 
(V, [oo]). Then, corresponding to the metric g, there exists a unique volume 
form [i = \i(g) G [oo] such that ^(e! , . . . , e n ) = 1 for every positively oriented 
g-orthonormal basis {ej for V. Moreover, if {Vj} is any (not necessarily g- 
orthonormal) positively oriented basis for V with dual basis {v 1 }, then 

u. = |det(g(v„ Vj ))| 1/2 v ! a • • • av 11 . 

In particular, if {v s } = {ej is a g-orthonormal basis, then [i = e ! A • • • Ae n . 

Proof Since co ^ 0, there exists a positively oriented g-orthonormal basis {ei} 
such that (x)(e u . . . , e n ) > (we can multiply ej by -1 if necessary in order 
that {ej} be positively oriented). We now define \i G [oo] by 

u.(e l5 . . . , e n ) = 1 . 

That this defines a unique \x follows by multilinearity. We claim that if {fj} is 
any other positively oriented g-orthonormal basis, then \i(f u . . . , f n ) = 1 also. 
To show this, we first prove a simple general result. 

Suppose {vj is any other basis for V related to the g-orthonormal basis 
{ei} by Vj = ^(eO = e^ where, by Theorem 11.17, we have det § = deuVj). 
We then have g(v s , Vj ) = a r ia S jg(e r , e s ) which in matrix notation is [g] v = 
A T [g] e A, and hence 

det(g(v„ vj)) = (dettf.) 2 det(g(e,, e s )) . (9) 

However, since {e ; } is g-orthonormal we have g(e r , e s ) = ±6 rs , and therefore 
|det(g(e r , e s ))| = 1. In other words 



IdetGKv^OF 2 =|det0| 



(10) 
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Returning to our problem, we have det(g(f s , fj)) = ±1 also since {fj} = 
{cb(ej)} is g-orthonormal. Thus (10) implies that |det (|)| = 1. But {f s } is posi- 
tively oriented so that \x(f u . . . , f n ) >0 by definition. Therefore 

0< J u(/ 1 , ... , f n ) = ii{<p{e x ), ... ii((j)(e n )) = ((l)*[i)(e l , ... ,e n ) 
= (dQt(j))^(e l , ... , e n ) = det0 

so that we must in fact have det § = +1. In other words, u,(f l5 . . . , f n ) = 1 as 
claimed. 

Now suppose that {Vj} is an arbitrary positively oriented basis for V such 
that Vj = cj>(ei). Then, analogously to what we have just shown, we see that 
H(v l5 . . . , v n ) = det (|) > 0. Hence (10) shows that (using Example 1 1.8) 

p(y lt ... , vj = det<£ 

= |det(g(v„v y ))| 1/2 

= |det(g(v,,v y ))r /2 v 1 A---Av"(v 1 ,...,vJ 

which implies 

^ = |det(g(v„ Vj ))| 1/2 v ! a • • • av 11 . I 

The unique volume form \x defined in Theorem 11.20 is called the g- 
volume, or sometimes the metric volume form. A common (although rather 

careless) notation is to write |det(g(Vj, Vj))| 1/2 = VlgJ. In this notation, the g- 
volume is written as 

VlgjVA • • • AV n 

where {v l5 . . . , v n } must be positively oriented. If the basis {v b v 2 , . . . , v n } 
is negatively oriented, then clearly {v 2 , v l5 . . . , v n } will be positively 
oriented. Furthermore, even though the matrix of g relative to each of these 
oriented bases will be different, the determinant actually remains unchanged 
(see the discussion following the corollary to Theorem 11.13). Therefore, for 
this negatively oriented basis, the g- volume is 

V / lg[v 2 AV 1 A- • -AV n = -VlgTv 1 AV 2 A • • • AV n . 

We thus have the following corollary to Theorem 1 1.20. 

Corollary Let {vj be any basis for the n-dimensional oriented vector space 
(V, [co]) with metric g. Then the g- volume form on V is given by 
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where the "+" sign is for {Vj} positively oriented, and the "-" sign is for {v s } 
negatively oriented. 

Example 11.14 From Example 11.13, we see that for a Riemannian metric g 
and g-orthonormal basis {ej we have det(g(ej, ej)) = +1. Hence, from equa- 
tion (9), we see that det(g(v s , Vj)) > for any basis {v s = (KeO}. Thus the g- 
volume form on a Riemannian space is given by ±vg~ v j a • • • av 11 . 

For a Lorentz metric we have det(,f(e i , e^) = -1 in a Lorentz frame, and 
therefore det(g(Vj, v,)) < in an arbitrary frame. Thus the g-volume in a 

Lorentz space is given by ±V^g v ! a • • • av". 

Let us point out that had we defined Ind(r|) = n - 1 instead of Ind(r|) = 1, 
then det(r|(ei, e^) < only in an even dimensional space. In this case, we 
would have to write the g-volume as in the above corollary. / 

Example 11.15 (This example is a continuation of Example 11.11.) We 
remark that these volume elements are of great practical use in the theory of 
integration on manifolds. To see an example of how this is done, let us use 
Examples 11.1 and 11.11 to write the volume element as (remember that this 
applies only locally, and hence the metric depends on the coordinates) 

dx = VTgfdx^ • • • Adx n . 

If we go to a new coordinate system {x 1 }, then 

_ _ dx r dx s 
dx dx J 



so that |g| = (J ') 2 lgl where J 1 = det(dx r /dx 1 ) is the determinant of the inverse 
Jacobian matrix of the transformation. But using dx 1 = (3x73x J )dx J and the 
properties of the wedge product, it is easy to see that 



dx a ••• Adx n = 



dx 



dx" 



dx h dx ln 



dx h A • • • a dx n 



= det 



K dx J j 



dx 1 A • • • a dx" 



and hence 



dx J A- • -Adx n = Jdx ! A • • • Adx n 
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where J is the determinant of the Jacobian matrix. (Note that the proper 
transformation formula for the volume element in multiple integrals arises 
naturally in the algebra of exterior forms.) We now have 

df = sj\i\dx l A-AdSc" = J~ l ^g\Jdx l A---Adx n 
= ^J\g\dx 1 a • • • a dx n = dx 

and hence dx is a scalar called the invariant volume element. In the case of 
R 4 as a Lorentz space, this result is used in the theory of relativity. / 



Exercises 



1. Suppose V has a metric defined on it. Show that for any A, B G V we 
have (A, B) = a.b 1 = a'b;. 

2. According to the special theory of relativity, the speed of light is the same 
for all unaccelerated observers regardless of the motion of the source of 
light relative to the observer. Consider two observers moving at a constant 
velocity (3 with respect to each other, and assume that the origins of their 
respective coordinate systems coincide at t = 0. If a spherical pulse of light 
is emitted from the origin at t = 0, then (in units where the speed of light is 

equal to 1) this pulse satisfies the equation x 2 + y 2 + z 2 - t 2 = for the first 
observer, and x 2 + y 2 + z 2 - 1 2 = for the second observer. We shall use 
the common notation (t, x, y, z) = (x°, x 1 , x 2 , x 3 ) for our coordinates, and 
hence the Lorentz metric takes the form 



l-\ 



where < u,, v < 3. 

(a) Let the Lorentz transformation matrix be A so that x M = A^ v x v . Show 
that the Lorentz transformation must satisfy A T r\A = r\. 

(b) If the {x^} system moves along the x^axis with velocity (3, then it 
turns out that the Lorentz transformation is given by 



x° = y(x° - px 1 ) 
x 1 = y(x ! - (3x°) 
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-2 2 

-3 3 
X = X 



where y 2 = 1/(1 - (3 2 ). Using A^ v = 3x^/dx v , write out the matrix (A^ v ), 

and verify explicitly that A T r|A = r\. 

(c) The electromagnetic field tensor is given by 



Using this, find the components of the electric field E and magnetic field 
B in the {x^} coordinate system. In other words, find F^ v . (The actual 

definition of F MV is given by F MV = d M A v - dyA^ where d M = dldx 1 * and 
A^ = ((|), Ai, A2, A3) is related to E and B through the classical equations 
E = -V<|> - d A/at and B = V x A. See also Exercise 11.1.6.) 

3. Let V be an n-dimensional vector space with a Lorentz metric r\, and let 
W be an (n - l)-dimensional subspace of V. Note that 



is the 1 -dimensional subspace of all normal vectors for W. We say that W 
is timelike if every normal vector is spacelike, null if every normal vector 
is null, and spacelike if every normal vector is timelike. Prove that ri 
restricted to W is 

(a) Positive definite if W is spacelike. 

(b) A Lorentz metric if W is timelike. 

(c) Degenerate if W is null. 

4. (a) Let D be a 3 x 3 determinant considered as a function of three contra- 
variant vectors A'd), A 1 (2), and A 1 ^). Show that under a change of coordi- 
nates, D does not transform as a scalar, but that DvTgT does transform as a 
proper scalar. [Hint: Use Exercise 11.2.8.] 



(b) Show that BykVlgT transforms like a tensor. (This is the Levi-Civita 
tensor in general coordinates. Note that in a g-orthonormal coordinate 
system this reduces to the Levi-Civita symbol.) 

(c) What is the contravariant version of the tensor in part (b)? 




E z B y -B x ) 



W x = {v G V: ti(v, w) = for all w E W} 



CHAPTER 12 



Hilbert Spaces 



The material to be presented in this chapter is essential for all advanced work 
in physics and analysis. We have attempted to present several relatively diffi- 
cult theorems in sufficient detail that they are readily understandable by 
readers with less background than normally might be required for such results. 
However, we assume that the reader is quite familiar with the contents of 
Appendices A and B, and we will frequently refer to results from these 
appendices. Essentially, this chapter serves as an introduction to the theory of 
infinite-dimensional vector spaces. Throughout this chapter we let E, F and G 
denote normed vector spaces over the real or complex number fields only. 

12.1 MATHEMATICAL PRELIMINARIES 

This rather long first section presents the elementary properties of limits and 
continuous functions. While most of this material properly falls under the 
heading of analysis, we do not assume that the reader has already had such a 
course. However, if these topics are familiar, then the reader should briefly 
scan the theorems of this section now, and return only for details if and when 
it becomes necessary. 

For ease of reference, we briefly repeat some of our earlier definitions and 
results. By a norm on a vector space E, we mean a mapping II II: E -* R satis- 
fying: 
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(Nl) Hull > for every uEE and Hull = if and only if u = (positive 

definiteness). 
(N2) llcull = |c| Hull for every u G E and c G f. 
(N3) llu + vll < Hull + Hvll (triangle inequality). 

If there is more than one norm on E under consideration, then we may denote 
them by subscripts such as II II 2 etc. Similarly, if we are discussing more than 
one space, then the norm associated with a space E will sometimes be denoted 
by II He . We call the pair (E, II II) a normed vector space. 

If E is a complex vector space, we define the Hermitian inner product as 
the mapping (,):ExE-»C such that for all u, v, w G E and c G C we have: 

(IP1) (u, v + w) = (u, v) + (u, w). 

(IP2) (cu, v) = c*(u, v). 

(IP3) (u, v) = (v, u>*. 

(IP4) (u, u) > and (u, u) = if and only if u = 0. 

where * denotes complex conjugation. A Hermitian inner product is some- 
times called a sesquilinear form. Note that (IP2) and (IP3) imply 

(u, cv) = (cv, u}* = c(v, u)* = c(u, v) 

and that (IP3) implies (v, v) is real. 

As usual, if (u, v) = we say that u and v are orthogonal, and we some- 
times write this as u J. v. If we let S be a subset of E, then the collection 

{u G E: (u, v) = for every v G S} 

is a subspace of E called the orthogonal complement of S, and is denoted by 
S x . 

It should be remarked that many authors define (cu, v) = c(u, v) rather than 
our (IP2), and the reader must be careful to note which definition is being fol- 
lowed. Furthermore, there is no reason why we could not have defined a map- 
ping E x E — > R, and in this case we have simply an inner product on E 
(where obviously there is now no complex conjugation). 

The most common example of a Hermitian inner product is the standard 
inner product on C n = C x • • • x C defined for all x = (x l5 . . . , x n ) and y = 
(y ls . . . , y n ) in C n by 

n 

(x,y) = ^x i *y i . 
i=i 



12.1 MATHEMATICAL PRELIMINARIES 



621 



We leave it to the reader to verify conditions (IP1) - (IP4). Before defining a 
norm on C n , we again prove (in a slightly different manner from that in 
Chapter 2) the Cauchy- Schwartz inequality. 

Example 12.1 Let E be a complex (or real) inner product space, and let u, 
v G E be nonzero vectors. Then for any a, b G C we have 

< (au + bv, au + bv) = |a| 2 (u, u) + a*b(u, v) + b*a(v, u) + |b| 2 (v, v) . 

Now note that the middle two terms are complex conjugates of each other, and 
hence their sum is 2Re(a*b(u, v)). Therefore, letting a = (v, v) and b = -(v, u), 
we have 

< (v, v) 2 (u, u) - 2(v, v)|(u, v)| 2 + |(u, v)| 2 (v, v) 
which is equivalent to 

(V, v)|(u, v)| 2 < (v, v) 2 (u, u) . 

Since v ^ we have (v, v) ^ 0, and hence dividing by (v, v) and taking the 
square root yields the desired result 

|(u, v)| < (u, u) 1/2 (v, v) 1/2 . // 

If a vector space E has an inner product defined on it, then we may define 
a norm on E by 

llvll = (v,v) 1/2 

for all v G E. Properties (Nl) and (N2) for this norm are obvious, and (N3) 
now follows from the Cauchy- Schwartz inequality and the fact that Re(u, v) < 
l(u,v}|: 

llw + vll 2 = (u + v, u + v) 

= yi 2 +2Re( M , v>+ llvll 2 
<yi 2 +2|( M , v)| + llvll 2 
<yi 2 +2ll M IHMI + llvll 2 

= (iy+ iivii) 2 . 

We leave it to the reader (Exercise 12.1.1) to prove the so-called parallel- 
ogram law in an inner product space (E, ( , )): 



||u + v|| 2 + ||u - vll 2 = 



2llull 2 + 2IMI 2 
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The geometric meaning of this formula in R 2 is that the sum of the squares of 
the diagonals of a parallelogram is equal to the sum of the squares of the sides. 
If (u, v) = 0, then the reader can also easily prove the Pythagorean theorem: 

llu + vll 2 = Hull 2 + llvll 2 ifu±v . 

In terms of the standard inner product on C n , we now define a norm on C n 

by 

|| x || 2 =(x,x) = 2k-| 2 • 
(=1 

The above results now show that this does indeed satisfy the requirements of a 
norm. 

Continuing, if (E, II II) is a normed space, then we may make E into a 
metric space (E, d) by defining 

d(u,v) = llu-vll . 

Again, the only part of the definition of a metric space (see Appendix A) that 
is not obvious is (M4), and this now follows from (N3) because 

d(u, v) = \\u - v\\ = \\u - w + w - v\\ <\\u - w\\ + \\w - vll 
= d(u, w) + d(w, v) . 

The important point to get from all this is that normed vector spaces form 
a special class of metric spaces. This means that all the results from Appendix 
A and many of the results from Appendix B will carry over to the case of 
normed spaces. In Appendix B we presented the theory of sequences and 
series of numbers. As we explained there however, many of the results are 
valid as well for normed vector spaces if we simply replace the absolute value 
by the norm. 

For example, suppose ACE and let v G E. Recall that v is said to be an 
accumulation point of A if every open ball centered at v contains a point of 
A distinct from v. In other words, given e > there exists u G A, u ^ v, such 
that llu - vll < e. As expected, if {v n } is a sequence of vectors in E, then we say 
that {v n } converges to the limit v G E if given £ > 0, there exists an integer 
N > such that n > N implies llv n - vll < e. As usual, we write lim v n = 
limn^ocVn = v. If there exists a neighborhood of v (i.e., an open ball contain- 
ing v) such that v is the only point of A in this neighborhood, then we say that 
v is an isolated point of A. 
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Example 12.2 Suppose lim v n = v. Then for every e > 0, there exists N such 
that n > N implies II v - v n ll < e. From Example 2. 1 1 we then see that 

| llvll - llVnll I < IIV - V n ll < 8 

and hence directly from the definition of lim llv n ll we have 

II lim v n II = llvll = lim llvnll . 

This result will be quite useful in several later proofs. / 

Note that if v is an accumulation point of A, then for every n > there 
exists v n G A, v n ^ v, such that llv n - vll < 1/n. In particular, for any e > 0, 
choose N so that 1/N < e. Then for all n > N we have llv n - vll < 1/n < e so that 
{v n } converges to v. Conversely, it is clear that if v n — > v with v n G A, v n ^ v, 
then v must necessarily be an accumulation point of A. This proves the fol- 
lowing result. 

Theorem 12.1 If A C E, then v is an accumulation point of A if and only if it 
is the limit of some sequence in A - {v}. 

A function f: (X, dx) — * (Y, dy) is said to be continuous at x G X if for each 
e > there exists 6 > such that dx(x, x ) < 6 implies dY(f(x), f(x )) < e. Note 
though, that for any given e, the 6 required will in general be different for 
each point x chosen. If f is continuous at each point of X, then we say that f is 
"continuous on X." 

A function f as defined above is said to be uniformly continuous on X if 
for each e > 0, there exists 6 > such that for all x, y G X with dx(x, y) < 6 
we have dY(f(x), f(y)) < e. The important difference between continuity and 
uniform continuity is that for a uniformly continuous function, once e is 
chosen, there is a single 6 (which will generally still depend on e) such that 
this definition applies to all points x, y G X subject only to the requirement 
that dx(x, y) < 6. It should be clear that a uniformly continuous function is 
necessarily continuous, but the converse is not generally true. We do though 
have the next very important result. However, since we shall not have any 
occasion to refer to it in this text, we present it only for its own sake and as an 
(important and useful) illustration of the concepts involved. 

Theorem 12.2 Let A C (X, d x ) be compact, and let f: A -*- (Y, d Y ) be con- 
tinuous. Then f is uniformly continuous. In other words, a continuous function 
on a compact set is uniformly continuous. 
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Proof Fix e > 0. Since f is continuous on A, for each point x G A there exists 
6 X > such that for all y G A, dx(x, y) < 6 X implies dY(f(x), f(y)) < e/2. The 
collection {B(x, 6 x /2): x G A} of open balls clearly covers A, and since A is 
compact, a finite number will cover A. Let {x l5 . . . , x n } be the finite collec- 
tion of points such that {B(x h 6 Xi /2)}, i = 1, . . . , n covers A, and define 6 = 
(l/2)min({6 Xi }). Since each 6 Xi > 0, 6 must also be > 0. (Note that if A were 
not compact, then 6 = inf({6 x }) taken over all x G A could be equal to 0.) 

Now let x, y G A be any two points such that dx(x, y) < 6. Since the 
collection {B(x s , 5 x /2)} covers A, x must lie in some B(x s , 6 Xi /2), and hence 
dx(x, Xi) < 6 Xi /2 for this particular x s . Then we also have 

d x (y, x,) < d x (x, y) + d x (x, x,) < 6 + 6 Xi /2 < 6 Xi . 

But f is continuous at x s , and 6 Xi was defined so that the set of points z for 
which dx(z, x s ) < 6 Xi satisfies dY(f(z), fOO) < s/2. Since we just showed that x 
and y satisfy dx(x, x s ) < 6 Xi /2 < 6 Xi and dx(y, x s ) < 6 Xi , we must have 

d Y (f(x), f(y)) < d Y (f(x), f(x ; )) + d Y (f(y), f(x,)) < e/2 + e/2 = £ . 

In other words, for our given s, we found a 6 such that for all x, y G A 
with dx(x, y) < 6, we have dy(f(x), f(y)) < s. I 

Example 12.3 Consider the function f: E -* IR defined by f(u) = Hull. In other 
words, f is just the norm function on E. Referring to the above discussion, we 
say that a function g is uniformly continuous if given e > 0, there exists 6 > 
such that llu - vll < 6 implies that |g(u) - g(v)| < s (note that the norm on E is 
II II while the norm on R is | |). But for our norm function f and for any s > 0, 
we see that for all u, v G E, if we choose 6 = £ then llu - vll < 6 = e implies 

|f(u) - f(v)| = | Hull - llvll | < llu - vll < £ 

(where we used Example 2.11). Thus the norm is in fact uniformly continuous 
on E. 

We leave it to the reader (see Exercise 12.1.2) to show (using the Cauchy- 
Schwartz inequality) that the inner product on E is also continuous in both 
variables. / 

There is an equivalent way to define continuous functions in terms of 
limits that is also of great use. Let X and Y be metric spaces, and suppose f: 
ACX->Y. Then if x is an accumulation point of A, we say that a point L G 
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Y is the limit of f at x if, given 8 > 0, there exists 6 > (which may depend 
on f, x and e) such that for all x G A we have < dx(x, x ) < 6 implies 
dY(f(x), L) < e. This is written as lim x ^ xo f(x) = L or simply "f(x) -* L as 
x - x ." 

Note that while x is an accumulation point of A, x is not necessarily an 
element of A, and hence f(x ) might not be defined. In addition, even if x G 
A, it is not necessarily true that lim x ^ xo f(x) = f(x ). However, we do have the 
following result. 

Theorem 12.3 If f: A C (X, dx) -* (Y, dy) and x G A is an accumulation 
point of A, then f is continuous at x if and only if 

lim x ^ xo f(x) = f(x ) . 

Proof This obvious by comparison of the definition of continuity of f at x , 
and the definition of the limit of f at x . I 

Before we can prove the basic properties of continuous functions, we must 
prove some elementary properties of limits. First we need a definition. A 
product on E x F -*■ G is a mapping denoted by (u, v) >-> uv that is bilinear 
and satisfies IIuvIIg ^ Hull e llvllp. For example, using the Cauchy-Schwartz 

inequality, we see that the usual inner product on R n is just a product on R n x 
R n — R. 

Example 12.4 We say that a function f: S — * F is bounded if there exists 
M > such that llf(x)ll < M for all x G S. Now consider the space E = ®(S, R) 
of real- valued bounded functions on any nonempty set S. Let us define a norm 
II II ,0 on S(S, R) by 

Hfll oo = SU PxeS If(x)l 

for any f G (B(S, R). This important norm is called the sup norm. For any f, 
g G ®(S, R) suppose Ilfll oo = Q and HgL = C 2 . Then it follows that |f(x)| < C, 
and |g(x)| < C 2 for all x G S. But then for all x G S we have 

|f(x)g(x)| = |f(x)||g(x)| < C,C 2 = Ilfll oollglloo 

so that the usual product of (real- valued) functions is also bounded. Therefore 
we see that 

llfglloo < Ilfll oollglloo 

and since the usual product is obviously bilinear, we have a (general) product 
on E x E -*■ E. / 
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With the notion of a product carefully defined, we can repeat parts (a) - 
(c) of Theorem B2 in a more general form as follows. The proof is virtually 
identical to that of Theorem B2 except that here we replace the absolute value 
by the norm. 

Theorem 12.4 (a) Let u n -* u and v n -*■ v be convergent sequences of vec- 
tors in E. Then lim n ^ + v n ) = u + v. 

(b) Let v n -* v be a convergent sequence of vectors in E, and let c be a 
scalar. Then lim n ^ oo(cv n ) = cv. 

(c) Let u n -*■ u and v n -*■ v be convergent sequences of vectors in E and F 
respectively, and let E x F -* G be a product. Then lim n ^ oo(u n v n ) = uv. 

Theorem 12.5 (a) Suppose that ACE and v is an accumulation point of A. 
Let f and g be mappings of A into F, and assume that lim u ^ v f(u) = w, and 
lim u ^ v g(u) = w 2 . Then 

lim u _ v (f + g)(u) = w, + w 2 . 

(b) Let A be a subset of some normed space, and let v be an accumulation 
point of A. Let f: A -*■ E and g: A -*■ F be mappings, and assume further that 
lim u ^ v f(u) = W[ and lim u ^ v g(u) = w 2 . If E x F -*■ G is a product, then 

lim u ^ v f(u)g(u) = wiw 2 . 

Proof (a) Given e > 0, there exists S[ > such that if u G A with llu - vll < 6, 
then |f(u) - wj < e/2. Similarly, there exists 6 2 > such that llu - vll < 6 2 
implies |g(u) - w 2 | < e/2. Choosing 6 = min{6 b 6 2 } we see that if u G A and 
llu - vll < 5 we have 

H(/ + g)(") - (w l + w 2 )ll = \\f(u) -w l + g{u) - w 2 ll 

<, II /(«)- Wi II + II g(u)-w 2 II 
<£/2 + £/2 

= e . 

(b) Given e > 0, there exists 6, > such that llu - vll < 6] implies 
llf(u) - Wl ll < E/[2(l + llw 2 ll)] . 
Similarly, there exists 6 2 > such that llu - vll < 6 2 implies 

llg(u) - W 2 II < E/[2(l + llwjl)] . 



12.1 MATHEMATICAL PRELIMINARIES 



627 



From the definition of limit, given 8=1 there exists 63 > such that llu - vll < 
63 implies 

llf(u) - wjl < 1 . 
But from Example 2. 1 1 we see that 

llf(u)ll - llwjl < llf(u) - wjl < 1 

which implies 

llf(u)ll < 1 + llWill . 
If we let 6 = min{6i, 6 2 , 63}, then for all u G A with llu - vll < 6 we have 
\\f(u)g(u) - w x w 2 \ = \\f{u)g{u) - f(u)w 2 + f{u)w 2 - w x w 2 \ 

< II f(u)\\ II g(u) -w 2 \\ + II f(u) - Wi II II w 2 II 

< (1 + llw 1 ll)e/[2(l + llwjl)] + e/[2(l + lw 2 ll)] lw 2 l 

<£/2 + £/2 
= £ . I 

The reader should realize that the norms used in the last proof are not 
defined on the same normed space. However, it would have been too cluttered 
for us to distinguish between them, and this practice is usually followed by 
most authors. 

It will also be of use to formulate the limit of a composition of mappings. 

Theorem 12.6 Suppose ACE and B C F, and let f: A -> B and g: B -» G 

be mappings. Assume that u is an accumulation point of A and that 
lim x ^ u f(x) = v. Assume also that v is an accumulation point of B and that 
lim y ^ v g(y) = w. Then 

lim x ^ u (g°f)(x) = lim x ^ u g(f(x)) = w . 

Proof Given £ > 0, there exists Si > such that for all y G B with lly - vll < 6,, 
we have llg(y) - wll < e. Then given this 6 l5 there exists 6 2 > such that for all 
x G A with llx - ull < 6 2 , we have llf(x) - vll < 6,. But now letting y = f(x), we 
see that for such an x G A we must have llg(f(x)) - wll < s. I 

We are now in a position to prove the basic properties of continuous func- 
tions on a normed space. 
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Theorem 12.7 (a) If A C E and f, g: A -*■ F are continuous at v G A, then 
the sum f + g is continuous at v. 

(b) Let f: A -* E and g: A -* F be continuous at v G A and suppose that 
E x F -* G is a product. Then the product map fg is continuous at v. 

(c) Suppose A C E, B C F, and let f: A -*■ B and g: B -*■ G be mappings. 
Assume that f is continuous at v G A with f(v) = w, and assume that g is con- 
tinuous at w. Then the composite map g ° f is continuous at v. 

(d) A mapping f: E — > F is continuous at v if and only if for every 
sequence {v n } in E we have v n -*■ v implies f(v n ) -» f(v). In other words, we 
have lim f(v n ) = f(lim v n ). 

Proof (a) If v is an isolated point there is nothing to prove, so assume that v 
is an accumulation point. Then by Theorems 12.5 and 12.3 we have 

lim„_ v (/ + g){u) = lim„^ v f(u) + lim M ^ v g(u) 
= /(v) + g(v) 
= (/ + g)(v) • 

(b) This also follows from Theorems 12.5 and 12.3. 

(c) Left to the reader (see Exercise 12.1.3). 

(d) We first assume that f is continuous at v, and that the sequence {v n } 
converges to v. We must show that f(v n ) — > f(v). Now, since f is continuous, 
we know that given £ > 0, there exists 6 > such that llu - vll < 6 implies 
llf(u) - f(v)ll < e. Furthermore, the convergence of {v n } means that given 6 > 0, 
there exists N such that llv n - vll < 6 for every n > N. Therefore, for every n > 
N we have llv n - vll < 5 implies llf(v n ) - f(v)ll < 8. 

We now prove that if f is not continuous at v, then there exists a conver- 
gent sequence v n — > v for which f(v n ) -f> f(v). It will be notationally simpler 
for us to formulate this proof in terms of open balls defined by the induced 
metric (see Appendix A). If f is not continuous at v, then there exists B(f(v), e) 
with no corresponding B(v, 6) such that f(B(v, 6)) C B(f(v), e). Consider the 
sequence of open balls {B(v, 1/n)} for n = 1, 2, . . . . Since f is not continu- 
ous, we can find v n G B(v, 1/n) such that f(v n ) ^ B(f(v), s). It is clear that the 
sequence {v n } converges to v (given e, choose n > N = 1/e), but that f(v n ) 
does not converge to f(v) since by construction B(f(v), s) contains none of the 

f(v„). ■ 

Since the notion of open sets is extremely important in much of what fol- 
lows, it is natural to wonder whether different norms defined on a space lead 
to different open sets (through their induced metrics). We shall say that two 
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norms II II, and II II 2 defined on E are equivalent if there exists a number C such 
that for all u G E we have 

C-'llull, < llull 2 < Cllulli . 

We leave it to the reader to show that this defines an equivalence relation on 
the set of all norms on E (see Exercise 12.1.4). 

Example 12.5 It is easy to see that this definition does exactly what we want 
it to do. For example, suppose U C E is open relative to a norm II II,. This 
means that for any u G U, there exists s, > such that llu - vll, < e, implies v G 
U. We would like to show that given an equivalent norm II II 2 , then there exists 
e 2 > such that llu - vll 2 < e 2 implies v G U. We know there exists C > such 
that C"'ll II, < II ll 2 < C II II,, and hence choosing e 2 = e,/C, it follows that for all 
v G E with llu - vll 2 < b 2 we have 

llu - Vll, < C llu - vll 2 < Cb 2 = Ei 

so that v G U. Therefore we have shown that if a set is open with respect to 
one norm, then it is open with respect to any equivalent norm. / 

Example 12.6 It is also easy to give an example of two non-equivalent norms 
defined on a space. To see this, consider the space E of all real-valued contin- 
uous functions defined on [0, 1]. We define a norm on E by means of the 
scalar product. Thus for any f , g G E we define the scalar product by 

(f, g) = J 1 f(x)g(x)dx 

and the associated norm by llfll 2 = (f, f ) m . This norm is usually called the L - 
norm. Alternatively, we note that any continuous real function defined on 
[0, 1] must be bounded (Theorems A8 and A14). Hence we may also define 
the sup norm llf II „ by 

UAL = sup |f(x)| 

where the sup is taken over all x G [0, 1]. 

Now suppose f G E and write llf II oo = C. Then we have 

} l Q [f(x)] 2 dx< f l Q C 2 dx = C 2 

and hence llf II 2 < llf II oo- However, this is only half of the inequalities required by 
the definition. Consider the peaked function defined on [0, 1] by 
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If we let this function become arbitrarily narrow while maintaining the height, 
it is clear that the sup norm will always be equal to 1, but that the L 2 -norm can 
be made arbitrarily small. / 

The source of the problem that arose in this example is a result of the fact 
that the space E of continuous functions defined on [0, 1] is infinite- 
dimensional. In fact, we will soon prove that this can not occur in finite- 
dimensional spaces. In other words, we will see that all norms are equivalent 
in a finite-dimensional space. 

The reader may wonder whether or not the limits we have defined depend 
in any way on the particular norm being used. It is easy to show that if the 
limit of a sequence exists with respect to one norm, then it exists with respect 
to any other equivalent norm, and in fact the limits are equal (see Exercise 
12.1.5). It should now be clear that a function that is continuous at a point v 
with respect to a norm II II, is also continuous at v with respect to any equiva- 
lent norm II II 2 . 

Now recall from Appendix B that a metric space in which every Cauchy 
sequence converges to a point in the space is said to be complete. It was also 
shown there that the space R n is complete with respect to the standard norm 
(Theorem B8), and hence so is the space C n (since C n may be thought of as 
R n x R n = R 2n ). Recall also that a Banach space is a normed vector space 
(E, II II) that is complete as a metric space (where as usual, the metric is that 
induced by the norm). If an inner product space (E, ( , )) is complete as a 
metric space (again with the metric defined by the norm induced by the inner 
product), then E is called a Hilbert space. 

It is natural to wonder whether a space that is complete relative to one 
norm is necessarily complete relative to any other equivalent norm. This is 
answered by the next theorem. In the proof that follows, it will be convenient 

to use the nonstandard norm II II N defined on R n (or C n ) by 

ll( M \...,w")ll^ = j>y'| 
(=i 
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where (u 1 , . . . , u n ) is a vector n-tuple in R n (or C n ). In R 2 , the unit ball 
{( Ul , u 2 ): IKu 1 , u 2 )ll N < 1} looks like 



(0, 1) 




(0, -1) 



Theorem 12.8 Let E be a finite-dimensional vector space over either R or C. 
Then 

(a) There exists a norm on E. 

(b) All norms on E are equivalent. 

(c) All norms on E are complete. 

Proof Let {e b . . . , e n } be a basis for E so that any u G E may be written as 

u = Su^j. 

(a) We define the norm II II [ on E by 

bii^jriwi . 

i-l 

Properties (Nl) and (N2) are trivial to verify, and if v = Sv'ej is any other 
vector in E, then u + v = 2(u : + v 1 )^, and hence 

ll M + vll 1 = 2l M ''+v i ' I <2(V l + lv ! ' I) = 2lw ! ' l+2lv ! ' I 
= | M || 1 +llvll 1 

so that (N3) is also satisfied. 

This norm is quite convenient for a number of purposes. Note that it yields 
the same result for any v = 2v*ej G E as does the nonstandard norm II II N for 
the corresponding (v 1 , . . . , v n ) G R n (or C n ). 

(b) Let II II 2 be any other norm on E, and let u, v G E be arbitrary. Using 
Example 2.11 and properties (N3) and (N2), we see that 
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|IImII 2 -IvI 2 | < llw-vll 2 

= H2(w ! '-v> ; .|l 2 

< 2ll(z/'-v>J 2 

= 2|i# , '-v , '|le | .| 2 

£ maxi*»{lc/l2} 2 l M, '- vi l 

= max laiaB {le I .| 2 }l( M 1 , ••• , u n )-(v\ ... , v n )\\ N 

Define the mapping f: C n -* R by x = (x 1 , . . . , x n ) >-» Ex^llj G [0, «). To 
say that f is uniformly continuous on C n with respect to the norm II II N means 
that given e > 0, we can find 6 > such that for all x, y G C n with 

llx - yll N = ll(x\ . . . , x n ) - (y 1 , . . . , y n )ll N < 6 

we have 

|f(x) - f(y)| = | 112x^2 - Ey^l^ I < e . 

If we define B = maxi<i< n {llejll 2 }, then choosing 6 = e/B, we see (*) shows that 

f is (uniformly) continuous with respect to the norm II II N on C n . 

We now note that the unit sphere S = {x G C n : llxll N = 1} is closed and 
bounded, and hence S is compact (Theorem A 14). The restriction of the func- 
tion f to S is then continuous and strictly positive (by (Nl)), and hence 
according to Theorem A 15, f attains both its minimum m and maximum M on 

S. In other words, for every x = (x 1 , . . . , x n ) G C n with llxll N = 1 we have < 

m < Ex'ejj < M. Since llxll N = II (x 1 , . . . , x n ) ll N = 1, we may write 

mIKx 1 , . . . , x n )ll N < 112x^112 < MlKx 1 , . . . , x n )ll N 

or, using part (a) with u = 2x 1 e i G E, we find that 

mllull, < Hull 2 < Mllull, . 

Choosing C = max{l/m, M}, we see that II II, and II II 2 are equivalent. The fact 
that II II 2 was arbitrary combined with the fact that equivalent norms form an 
equivalence class completes the proof that all norms on E are equivalent. 

(c) It suffices to show that E is complete with respect to any particular 
norm on E. This is because part (b) together with the fact that a sequence that 
converges with respect to one norm must converge with respect to any equiv- 
alent norm then shows that E will be complete with respect to any norm. 
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Recall from the corollary to Theorem 2.8 that E is isomorphic to either R n or 
C n . The result then follows from Theorem B8 or its obvious extension to C n 
(see Exercise 12.1.6). I 

We shall see that closed subspaces play an important role in the theory of 
Hilbert spaces. Because of this, we must make some simple observations. 
Suppose that Y is a closed subset of a complete space (X, d), and let {x n } be a 
Cauchy sequence in Y. Then {x n } is also obviously a Cauchy sequence in X, 
and hence x n — > x G X. But this means that x G CI Y = Y (Theorem B 13(b) or 
B 14(a)) so that {x n } converges in Y. 

On the other hand, suppose that Y is a complete subset of an arbitrary 
metric space (X, d) and let {x n } be any sequence in Y that converges to an 
element x G X. We claim that in fact x G Y which will prove that Y is closed 
(Theorem B 14(a)). Since x n -*■ x G X, it follows that {x n } is a Cauchy 
sequence in X (since any convergent sequence is necessarily Cauchy). In other 
words, given e > there exists N > such that m, n > N implies llx m - x n ll < e. 
But then {x n } is just a Cauchy sequence in Y (which is complete), and hence 

Xn^xGY. 

This discussion proves the next result. 

Theorem 12.9 Any closed subset of a complete metric space is also a com- 
plete metric space. On the other hand, if a subset of an arbitrary metric space 
is complete, then it is closed. 

Corollary A finite-dimensional subspace of any real or complex normed 
vector space is closed. 

Proof This follows from Theorems 12.8 and 12.9. I 

Now suppose that ACE and that we have a mapping f: A -*■ F where F = 
F, x • • • x F n is the Cartesian product of normed spaces. Then for any v G A 
we have f(v) = (f^v), . . . , f n (v)) where each f s : A -* F s is called the ith coor- 
dinate function of f. In other words, we write f = (f l5 . . . , f n ). 

If w = (wi, . . . , w n ) G F, then one possible norm on F, the sup norm, is 
defined by 

llwll = SUpi<i< n {IIWill} 

where llwj denotes the norm in Fj. However, this is not the only possible 
norm. Recall that if x = (x,, . . . , x n ) G R n = R x • • • x R, then the standard 
norm in R n is given by llxll 2 = 2 IxJ 2 . The analogous "Pythagorean" norm II II p 
on F would then be defined by llwllp 2 = 2 llWill 2 . Alternatively, we could also 
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define the "nonstandard" norm llwll N = 2 llwj. We leave it to the reader to 
show that these three norms are equivalent on F (see Exercise 12.1.7). 

The next result should come as no surprise. Note that we will use the sup 
norm on F as defined above. 

Theorem 12.10 Suppose ACE and let f: A -* F = F, x • • • x F n be a map- 
ping. If v is an accumulation point of A, then lim u ^ v f(u) exists if and only if 
lim u ^ v fi(u) exists for each i = 1, . . . , n. If this is the case and if lim u ^ v f(u) 
= w = (wi, . . . , w n ), then lim u ^ v f ; (u) = w ; for each i = 1, . . . , n. 

Proof First assume that lim u ^ v f(u) = w = (w b . . . , w n ). This means that 
given e > 0, there exists 6 such that llu - vll < 6 implies llf(u) - wll < e. If we 
write f(u) = (f ; (u), . . . , f n (u)), then for all u G A with llu - vll < 6, the defini- 
tion of sup norm tells us that 

llfi(u) - Will < llf(u) - wll < s . 

This proves that lim u ^ v fj(u) = w ; . 

Conversely, suppose lim u ^ v f ; (u) = w ; for each i = 1, . . . , n. Then given 
£ > 0, there exists S s such that llu - vll < Si implies llfj(u) - w ; ll < e. Defining S = 
min{Si}, we see that for all u G A with llu - vll < S, we have llfj(u) - Wjll < e for 
each i = 1, . . . , n and therefore 

llf(u) - Wll = SUp{llf,(u) - Wjll} < e . 

This shows that lim u ^ v f(u) = w. I 

Corollary The mapping f defined in Theorem 12.10 is continuous if and 
only if each f s is continuous. 

Proof Obvious from the definitions. I 
Exercises 

1. If u, v G (E, ( , }) prove: 

(a) The parallelogram law: llu + vll 2 + llu - vll 2 = 2llull 2 + 2IMI 2 . 

(b) The Pythagorean theorem: llu + vll 2 = Hull 2 + llvll 2 if u 1 v. 

2. Show that an inner product on E is continuous in both variables by 
showing that lim y ^ yo (x, y) = (x, y ) and lim x ^ xo (x, y) = (x , y). 
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3. Prove part (c) of Theorem 12.7. 

4. Show that equivalent norms define an equivalence relation on the set of all 
norms on E. 

5. (a) Suppose {v n } is a sequence in a normed space E. Show that if v n -* v 
with respect to one norm on E, then v n -> v with respect to any equivalent 
norm on E. 

(b) Show that if a function is continuous at a point with respect to one 
norm, then it is continuous at that point with respect to any equivalent 
norm. 

6. Fill in the details in the proof of Theorem 12.8(c). 

7. Let F = F, x • • • x F n be a Cartesian product of normed spaces, and sup- 
pose w = (w 1; . . . , w n ) G F. If llwjl denotes the norm on F ; , show that the 

norms llwll = supi<i< n {llwjll}, llwll p 2 = 2P=illWill 2 and llwll N = Z?=illWill are 
equivalent on F. 

8. Show that the set ®(S, E) of all bounded functions from a nonempty set S 
to a normed vector space E forms a vector space (over the same field as 
E). 

12.2 OPERATOR NORMS 

Suppose E and F are normed spaces, and let A: E — > F be a linear map. If 
there exists a number M > such that IIAvllp ^ MIIvIIe for all v G E, then A is 
said to be bounded, and the number M is called a bound for A. In other 
words, to say that A is bounded means that it takes bounded values on 
bounded sets. Note that we labeled our norms in a way that denotes which 
space they are defined on. From now on though, we shall not complicate our 
notation by this designation unless it is necessary. However, the reader should 
be careful to note that the symbol II II may mean two different things within a 
single equation. 

Recall also that a linear map A: E -*■ F is said to be continuous at v G E 
if given s > 0, there exists 6 > such that for all v G E with llv - vll < 6, we 
have IIAv - Avll = IIA(v - v) II < e. 
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Theorem 12.11 Let E be a finite-dimensional normed space, and let A: E — > 
F be a linear map of E into a normed space F (not necessarily finite- 
dimensional). Then A is bounded. 

Proof Let {e b . . . , e n } be a basis for E so that any v G E may be written in 
the form v = Sv'ej. Using the defining properties of the norm and the linearity 
of A, we then have 

IIAvll = IIAZv^ll = IIIVAeill < 2 I^AeJI = 2 |v*| IIAeJI . 

Since all norms on E are equivalent (Theorem 12.8), we use the norm II II, 
defined by llvll, = 2M. Thus any other norm II ll 2 on E will be related to II II, by 
C _1 II II 2 ^ II Hi ^ Cll II 2 for some number C. Since IIAeJ < o° for each i, we define 
the real number M = max{IIAeill}. Then 

IIAvll < M2|vi = Mllvlli < MCIIvll 2 . I 

Our next result is quite fundamental, and will be referred to again several 
times. 

Theorem 12.12 Let A: E -*■ F be a linear map of normed spaces. If A is 
bounded, then A is uniformly continuous, and if A is continuous at then A is 
bounded. 

Proof If A is bounded, then there exists M > such that IIAvll < Mllvll for 
every v G E. Then for any e > 0, we choose 6 = e/M so that for all u, v G E 
with llu - vll < 6, we have 

IIAu - Avll = IIA(u - v) II < Mllu - vll < s . 

This proves that A is uniformly continuous. 

Conversely, suppose A is continuous at G E. Then given e = 1, there 
exists 6 > such that II vll < 6 implies IIAvll < e = 1. In particular, we see that for 
any nonzero v G E we have Il6v/(2llvll)ll = 6/2 < 6 which implies IIA(6v/(2llvll))ll 
< 1. Taking out the constants yields IIAvll < (2/6) II vll. This shows that A is 
bounded with bound M = 2/6. I 

As shown in Exercise 12.2.1, there is nothing special about the continuity 
of A at the origin. 
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Corollary 1 A linear map A: E -*■ F is bounded if and only if A is continu- 
ous. 

Proof Obvious. I 

Corollary 2 Let E be finite-dimensional, and let A: E -*■ F be a linear map. 
Then A is uniformly continuous. 

Proof This follows directly from Theorems 12.1 1 and 12.12. I 

Let E and F be normed spaces, and let A: E -*■ F be a continuous (and 
hence bounded) linear map (if E is finite-dimensional, then the continuity 
requirement is redundant). We define the operator norm of A by 

IIAII = sup{IIAvll/llvll: vG£, v*0} 
= sup{IIAvll: llvll = l} . 

If llvll < 1, then we may write v = cv where llvll = 1 and |c| < 1. Then IIAvll = 
|c| IIAvll < IIAvll and therefore, since we are using the sup, an equivalent defini- 
tion of IIAII is 

IIAII = sup{IIAvll: llvll < 1} . 

From the first definition, we see that for any v G E we have IIAvll/llvll < IIAII, 
and hence we have the important result 

IIAvll < IIAII llvll . 

This shows that another equivalent definition of IIAII is 

IIAII = inf{M > 0: IIAvll < Mllvll for all vGE} . 

Another useful result follows by noting that if A: E -*■ F and B: F -> G, 
then for any v G E we have 

ll(B o A)vll = HB(Av) II < IIBII IIAvll < IIBII IIAII llvll 

and hence from the definition of the operator norm we have 

IIB o All < IIBII IIAII . 
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We denote the space of all continuous linear maps from E to F by L(E, F). 
ThatX(E, F) is in fact a vector space will be shown below. Since for any A G 
£(E, F) we have HAH = sup{IIAvll: llvll < 1}, we see that by restricting A to the 
unit ball in E, the space L(E, F) is just a subspace of the space $(S, F) of all 
bounded maps from S into F that was defined in Example 12.4 (where S is just 
the unit ball in E). 

Theorem 12.13 The space L(E, F) with the operator norm is a normed 
vector space. Moreover, if F is a Banach space, then so is L(E, F). 

Proof Suppose that A G L(E, F). We first verify requirements (Nl) - (N3) 
for a norm. From the definitions, it is obvious that HAH > and IIOII = 0. In addi- 
tion, if HAH = then for any v G E we have IIAvll < HAH llvll = which implies 
that A = 0. This verifies (Nl). If c G J, then 

HcAII = sup{ll(cA)vll: llvll < 1} 
= |c|sup{IIAvll: llvll = 1} 
= |c| HAH 

which verifies (N2). Now let A, B G L(E, F). Then using Theorem 0.5 we see 
that (leaving out the restriction on llvll) 

IIA + fill = sup{ll(A + fi)vll} 
= sup{HAv + 5vll} 
< sup{HAvll + HBvll} 
= sup{HAvll} + sup{H5vll} 
= UAH + 11511 

which proves (N3). That X(E, F) is in fact a vector space follows from 
Theorem 12.7(a) and (b). 

Now suppose that F is a Banach space and let {A n } be a Cauchy sequence 
in L(E, F). This means that for every e > there exists N such that m, n > N 
implies IIA m - A n ll < e. In particular, for any v G E and e > 0, there exists N 
such that for all m, n > N we have 

IIA m v - A n vll = ll(A m - A n )vll < IIA m - A n ll llvll < (e/llvll)llvll = £ 

so that {A n v} is a Cauchy sequence in F. Since F is a Banach space this 
sequence converges, and hence we define Av G F by 

Av = lim n ^ oo A n v . 
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This defines a map A: E — * F. Since each A n is linear, it should be clear 
(using Theorem 12.4) that A is linear. We must still show that A is continuous 
(so that A G L(E, F)) and that A n -» A. 

Given e > 0, there exists N such that IIA m - A n ll < e for all m, n > N. If v G 
E is such that llvll < 1 , then 

IIA m V - A n vll < IIA m - A n ll llvll < 8 . 

But {A m v} converges to Av, and hence letting m -*■ o° yields 

ll(A-A n )vll = IIAv-Anvll < e 

for every v G E with llvll < 1. This shows that A - A n is continuous at 0, and 
hence A - A n is m£(E, F) (by Theorem 12.12 and its Corollary 1). Thus A G 
£(E, F) (since each A n is). Finally, since A - A n GX(E, F), we may apply the 
definition of operator norm to obtain 

IIA - A n ll = sup{ll(A - A n )vll: llvll < 1} < e 

for every n > N, and hence A n ~~ * A. I 

Exercises 

1. Let A: E -*■ F be linear and continuous a some point vo G E. Prove 
directly that A is continuous at every v G E. 

2. (Linear Extension Theorem) Let E be a normed vector space, F a sub- 
space of E, and G a Banach space. Suppose A G L(F, G) and assume that 
IIAII = M. Prove: 

(a) The closure F of F in E is a subspace of E. 

(b) There exists a unique extension A G X(F , G) of A. [Hint: If v G F , 
then there exists {v n } G F such that v n — * v (why?). Show that {Av n } is 
Cauchy in G, and converges to a unique limit that is independent of {v n }. 
Define Av = lim Av n , and show that A is linear. Also show that Av = Av 
for any v G F. Next, show that A is bounded so that A G £(F , G) (why?), 
and finally show that the A so defined is unique.] 

(c) IIAII = IIAII. 
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3. Let E be a normed vector space. A completion (E, A) of E is a Banach 
space E together with a continuous injective linear mapping A: E — > E 
that preserves the norm and is such that A(E) is dense in E. Show that 
(E, A) is unique in the sense that if (F, B) is another completion of E, then 
there exists a unique invertible element C G L(E, F) such that B = C ° A. 
[Hint: Apply the previous exercise to the mappings B ° A" 1 and A ° B" 1 .] 

12.3 HILBERT SPACES 

Discussing infinite-dimensional vector spaces requires a certain amount of 
care that was not needed in our treatment of finite-dimensional spaces. For 
example, how are we to express an arbitrary vector as a linear combination of 
basis vectors? For that matter, how do we define a basis in an infinite- 
dimensional space? As another example, recall that in our treatment of 
operator adjoints, we restricted our discussion to finite-dimensional spaces 
(see Theorems 10.1 and 10.2). While we cannot define the adjoint in an 
arbitrary infinite-dimensional space (e.g., a Banach space), we shall see that it 
is nevertheless possible to make such a definition in a Hilbert space. 

Unfortunately, a thorough treatment of Hilbert spaces requires a knowl- 
edge of rather advanced integration theory (i.e., the Lebesgue theory). 
However, it is quite easy to present a fairly complete discussion of many of 
the basic and important properties of Hilbert spaces without using the general 
theory of integration. 

As in the previous sections of this chapter, we consider only vector spaces 
over the real and complex fields. In fact, unless otherwise noted we shall 
always assume that our scalars are complex numbers. Recall from Section 
12.1 that a Hilbert space is an inner product space which is complete as a 
metric space. We shall generally denote a Hilbert space by the letter H. 

To begin with, we recall that a linear space E is n-dimensional if it con- 
tains a set of n linearly independent vectors, but every set of n + 1 vectors is 
linearly dependent. If E contains n linearly independent vectors for every pos- 
itive integer n, then we say that E is infinite- dimensional. Let us rephrase 
some of our earlier discussion of (infinite) series in a terminology that fits in 
with the concept of an infinite-dimensional space. 

We have already seen that a sequence {v n } of vectors in a space (E, II II) 
converges to v G E if for each e > 0, there exists an N such that n > N implies 
II v n - vll < e. We sometimes write this as v n -> v or II v n - vll — > 0. Similarly, 
we say that an infinite linear combination 2^°= ia k w k of vectors in E con- 
verges if the sequence of partial sums v n = 2f?=ia k w k converges. In other 
words, to write v = 2k°= ia k w k means that v n -* v. If no explicit limits on the 
sum are given, then we assume that the sum is over an infinite number of 
vectors. 
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Just as we did with finite linear combinations, we define the addition of 
infinite linear combinations componentwise. Thus, if x = 2a n v n and y = 
2b n v n , then we define the sum x + ybyx + y = 2(a n + b n )v n . If x n = 
2£=ia k v k converges to x, and y n = 2£=ib k v k converges to y, then it is quite 
easy to see that x n + y n = 2{?=i(a k + b k )v k converges to x + y (see Exercise 
12.3.1). Furthermore, we define scalar multiplication of an infinite linear 
combination x = 2a n v n by cx = 2(ca n )v n . It is also easy to see that if the nth 
partial sum x n converges to x, then cx n converges to cx. 

In our next two examples we define the general Banach spaces /p 1 and / p , 
and we then show that both Z§ and I2 may be made into Hilbert spaces. (In 
more advanced work, the space l p may be generalized to include measure 
spaces.) Remember that our scalars may be either real or complex numbers. 

Example 12.7 If p is any real number such that 1 < p < 00, we let denote 
the space of all scalar n-tuples x = (x b . . . , x n ) with the norm II II p defined by 



1*1,= 



( n \ 



Up 



We first show that this does indeed define a norm on /p 1 . Properties (Nl) and 
(N2) of the norm are obvious, so it remains to show that property (N3) is also 
obeyed. To show this, we will prove two general results that are of importance 
in their own right. In the derivation to follow, if p occurs by itself, it is defined 
as above. If the numbers p and q occur together, then q is defined the same 
way as p, but we also assume that 1/p + 1/q = 1. (If p and q satisfy the relation 
1/p + 1/q = 1, then p and q are said to be conjugate exponents. Note that in 
this case both p and q are strictly greater than 1.) 

Let a and |3 be real numbers > 0. We first show that 

a Vp l3 l/q <a/p + l3/q . (1) 

This result is clear if either a or |3 is zero, so assume that both a and |3 are 
greater than zero. For any real k G (0, 1) define the function f(t) for t > 1 by 

f(t) = k(t - 1) - t k + 1 . 

From elementary calculus, we see that f (t) = k(l - t k_1 ), and hence f (t) > 
for every t > 1 and k G (0, 1). Since f(l) = 0, this implies that f(t) > 0, and thus 
the definition of f(t) shows that 
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t k < k(t- 1) + 1 = kt + (l -k) . 
If a > (3, we let t = a/(3 and k = 1/p to obtain 

(a/p) 1 ^ < a/(3p + (1 - 1/p) = a/pp + 1/q . 

Multiplying through by (3 and using (3 1_1/p = (3 1/q yields the desired result. 
Similarly, if a < (3 we let t = p/a and k = 1/q. 

To help see the meaning of (1), note that taking the logarithm of both sides 
of (1) yields 

— loga + — log/3 < logl — + — I . 
P Q \P qj 

The reader should recognize this as the statement that the logarithm is a 
"convex function" (see the figure below). 



logt 

(l/p)log a + (l/q)log |3 




We now use (1) to prove Holder's inequality: 

%y t \ = 



P 7 9 
(=1 



Again, we assume that x and y are both nonzero. Define a s = (IxJ / llxll p ) p and 
Pi = (ly,l / llyll q ) q • From (1) we see that 

|x,y,| / (llxllp ||y|| q ) < Oi/p + ft/q . 



Using the definition of II II p , it follows that S^ict; = 1 and similarly for p\. 
Hence summing the previous inequality over i = 1, . . . , n and using the fact 
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that 1/p + 1/q = 1 yields Holder's inequality. We remark that the particular 
case of p = q = 2 yields 







1/2 


fn \ 


1/2 


















U=l J 





which is called Cauchy's inequality. 

Finally, we use Holder's inequality to prove Minkowski's inequality: 

llx + yllp < llxllp + llyll p . 

If p = 1 this is obvious since |x ; + y ; | < |Xi| + lyj, so we may assume that p > 1. 
In this case we have 

(ll* + yll p y -Jl* +3^ 

=%x i+yi \\x i+yi r' (2) 
;=i 

Using Holder's inequality with y s replaced by |x s + yil p/q results in the 
inequality 

% i \\x i+ y i Y l ^\\x\\ p {\\ X + y\\ p )^ 
i=\ 

with a similar result if we interchange x ; and y s . Since 1/p + 1/q = 1 implies 
p/q = p - 1 , we now see that (2) yields 

(llx + yll p )P < (llxllp + llyllp)(llx + yllp)P- 1 . 

Dividing this by (llx + yll p ) p _1 yields Minkowski's inequality. 

We now see that Minkowski's inequality is just the requirement (N3), and 
thus we have shown that our norm on /p 1 is indeed a norm. Finally, it follows 
from Theorem 12.8 that /p 1 is complete and is thus a Banach space. 

We now consider the particular case of /§, and define an inner product in 
the expected manner by 

n 

{x, y) = ^ J x i *y i . 

i=l 
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Defining the norm by 



IIjcII = (jc, x) m = 



( n \ 
\i=l I 



1/2 



it is easy to see that satisfies all of the requirements for a Hilbert space. / 

In our next example, we generalize this result to the case of infinite- 
dimensional spaces. 

Example 12.8 As in the previous example, let p be any real number such that 
1 < p < oo. We let / p denote the space of all sequences x = {x l5 x 2 , . . . } of 

scalars such that 2k°= l lxJ p < 00 , and we define a norm on / p by 



Alp 



\k=\ I 



We must show that this definition also satisfies the properties of a norm, 
which means that we need only verify the not entirely obvious condition (N3). 

From the previous example, we may write Minkowski's inequality for the 
space /p in the form 



/ n \ Vp I n 

U=i / u-i / 




Now, if x, y G l p , then both (2k°=l |x k | p ) 1/p and (2k°°=i ly k l p ) 1/p exist since 
they are convergent by definition of lp . Hence taking the limit of 
Minkowski's inequality as n — * oo shows that this equation also applies to 
infinite series as well. (This requires the observation that the pth root is a 
continuous function so that, by Theorem 12.7(d), the limit may be taken inside 
the root.) In other words, the equation llx + yll p < llxll p + llyll p also applies to the 
space lp . This shows that our definition of a norm is satisfactory. It should 
also be clear that Holder's inequality similarly applies to the space / p . 

It is more difficult to show that / p is complete as a metric space. The origin 
of the problem is easily seen by referring to Theorems B3 and B8. In these 
theorems, we showed that a Cauchy sequence {x k } in R n led to n distinct 
Cauchy sequences {x k J } in R, each of which then converged to a number x J by 
the completeness of R. This means that for each j = 1, . . . , n there exists an N, 
such that IxJ - x^| < e/Vn for all k > N, . Letting N = max{Nj}, we see that 
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\\x k - xf = j | x k J ] - x J \ 2 < n(e 2 In) = e 2 

7=1 

for all k > N, and hence x k -* x. However, the case of l p we cannot take the 
max of an infinite number of integers. To circumvent this problem we may 
proceed as follows. 

To keep the notation as simple as possible and also consistent with most 
other authors, we let x = {x,, x 2 , . . . } be an element of / p with components x ; , 

and we let {x^} be a sequence in / p . Thus, the kth component of the vector 
x (n) G l p is given by x k (n) . Note that this is the opposite of our notation in the 
finite-dimensional case. 

Let {x^} be a Cauchy sequence in / p . This means that for any e > 0, there 
exists M > such that m, n > M implies Wx^ - x^ II p < e. Then, exactly as in 
the finite-dimensional case, for any k = 1 , 2, . . . we have 

00 

\x k im) - x k in) \ p < ^\xj (m) -Xj in) \ p =(\\x (m) -x {n) \\ p ) p <e p 

7=1 

and hence |x k (m) - x k (n) | < e. Therefore, for each k the sequence {x k ^ n) } of the 
kth component forms a Cauchy sequence. Since IR (or C) is complete, these 
sequences converge to a number which we denote by x k . In other words, for 
every k we have 

lim n ^ 00 x k (n) = x k . 

To show that / p is complete, we will show that the sequence x = {x k } is an 
element of / p , and that in fact llx (n) -* xll p -*■ 0. 

Using Minkowski's inequality, we have for every N and any n, 



N \ 1/p (N \ 1/p 

\k=l / U=l / 

\Up ( N \Vp 



I N 



N 



\k=i 



A/p 



^\ x k ~ x k n) \ l 



+ \\x {n) \\ 



\k=l 



(3) 
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Now write the nth term of the sequence {x^} as - x^ + to obtain 

Ux^llp = ||x< n > - x< m ) + x^\\ p < \\x^ - x^\\ p + llx^llp . 

Since {x (n) } is a Cauchy sequence , we know that given any e > 0, there exists 
M such that m, n > M implies Wx^ - x^ II p < s. Thus for any fixed m > 
M , the set {llx (n) ll p : n > M } of real numbers is bounded by 8 + llx (m) ll p . 

Moreover, we may take the max of the (finite) set of all llx (n) ll p with n < M . In 
other words, we have shown that the norms of every term in any Cauchy 
sequence are bounded, and hence we may write (3) as 



IN N 


hp i 




< 


U-l ) 


\ 



/ N 



\1/P 



+ B 



(4) 



where llx (n) ll p < B for all n. 

Since x k ^ — > x k for each of the finite number of terms k = 1, . . . , N, we 
can choose n sufficiently large (but depending on N) that the first term on the 
right hand side of (4) is < 1 , and hence for every N we have 



N 



2\x k \ p *a+By 



k-l 



This shows that the series 2^°= i |x k | p converges, and thus by definition of / p , 
the corresponding sequence x = {x k } is an element of / p . We must still show 
that x (n) x. 

Since {x (n) } is a Cauchy sequence, it follows that given e > 0, there exists 
M such that m, n > M implies llx (m) - x (n) II p < e. Then for any N and all m, 
n > M we have (using the Minkowski inequality again) 



N 



U-l 



I N 



k=\ 



■Alp 



+ 



I N 

I 

\k=l 



(m) (n)\p 
x k ~ x k 1 



J 



U=l 



+ llx (w) -x (n) ll 
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N 



,1/p 



\ X k~ X k 



(m)\p 



+ a 



(5) 



\k-l 



But x k ^ m ^ — * x k for each k = 1, . . . , N and hence (by the same argument used 
above) we can choose m sufficiently large that the first term in the last line of 
(5) is < e. This means that for every N and all n > m (where m is independent 
of N) we have 



/ N 



\Xu-X 



\ 

(n)\p 



lip 



2e 



U=i 



and hence taking the limit as N -*■ <*> yields 



\\x- x 



(")ii 



\k-l 



M\p 



2e 



Since this inequality holds for all n > M, we have shown that llx - x (n) ll p -*■ 
or, alternatively, that x (n) -* x. We have therefore shown that the space / p is 
complete, i.e., it is a Banach space. 

It is now easy to show that I2 is a Hilbert space. To see this, we define the 
inner product on I2 in the usual way by 

CO 

b>y) = ^ x k*y k ■ 

k-l 

Using the infinite-dimensional version of Holder's inequality with p = q = 2 
(i.e., Cauchy's inequality), we see that this series converges absolutely, and 
hence the series converges to a complex number (see Theorem B20). This 
shows that the inner product so defined is meaningful. The rest of the verifi- 
cation that I2 is a Hilbert space is straightforward and left to the reader (see 
Exercise 12.3.2). / 

Recall that a subset A of a metric space X is said to be dense if CI A = X. 
Intuitively, this simply means that any neighborhood of any x G X contains 
points of A (see Theorems B13 and B15). A space is said to be separable if it 
contains a countable dense subset. An important class of Hilbert spaces are 
those that are separable. 

Example 12.9 Let us show that the space h is actually separable. In other 
words, we shall show that I2 contains a countable dense subset. To see this, we 
say that a point x = {x,, x 2 , . . . } G h is a rational point if x n ^ for only a 
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finite number of the indices n, and if each of these nonzero components is a 
rational (complex) number. It should be clear that the set of all such rational 
points is countably infinite. We must now show that any neighborhood of any 
point in I2 contains at least one rational point. 

To do this, we show that given any x G I2 and any s > 0, there exists a 
rational point r = {r b r 2 , . . . , 0, 0, . . . } G I2 such that llr - XII2 < s. Since x G 
I2 the series 2k°= 1 |x k | 2 converges, and hence there exists N such that 



GO 

2 < £2/2 

k=N+l 



(see Theorem B17). Next, for each k = 1, . . . , N we find a rational number r k 
with the property that 

\r k -x k \ < , . 

4lN 



(That this can be done follows from Theorem 0.4 applied to both the real and 
imaginary parts of x k .) Then the distance between x and the rational point r = 
{r ls r 2 , . . . , r N , 0, 0, . . . } is given by 



/ N 



,1/2 



lr-jclo = 



U=i 



+ 2 lx k\ 

k=N+l 



< [N(e 2 /2N) + e 2 /2] in = e 



As the last remark of this section, the reader should note that the proof of 
the Cauchy-Schwartz inequality in Example 12.1 made no reference whatso- 
ever to any components, and thus it clearly holds in any Hilbert space, as does 
the parallelogram law. Furthermore, as mentioned in Example 12.3, the 
Cauchy-Schwartz inequality also shows that the inner product is continuous in 
each variable. Indeed, applying Theorem 12.7(d) we see that if x n -» x and 
y n -* y, then 

I (*„. y n ) ~(x,y)\ = I (*„ - x, y n - y) + {x n -x,y) + {x, y n -y)\ 

< lljc„-jcllllv„-vll + lljc„-jcllllyll + llxll llv„-yll - 



This is sometimes expressed by saying that the inner product is jointly 
continuous. Alternatively, we can note that 



l(x„ y) - (x 2 , y}| = |(x, - x 2 , y)| < llx, - x 2 ll llyll 
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which shows that the map x -*■ (x, y) is actually uniformly continuous, with 
the same result holding for y -» (x, y). 

Exercises 

1. If x n = 2g=ia k v k x and y n = 2g=ib k v k -» y, show that x n + y n -» x + y. 

2. Complete the proof (begun in Example 12.8) that Z2 is a Hilbert space. 
[Hint: Note that if x = {x b x 2 , . . . } and y = {y 1; y 2 , . . . } are vectors in Z2, 
then you must show that x + yGfe also.] 

3. Prove that every compact metric space (X, d) is separable. [Hint: For each 
integer n > 1 consider the collection U n of open spheres 

U n = {B(x, 1/n): xGX} .] 

4. Let H be a Hilbert space and suppose A G -£(H) is a positive symmetric 
operator. Prove the generalized Schwartz inequality: 

|(Ax, y)| 2 < (Ax, x)(Ay, y) 

where x, y G H. [Hint: Let c be a real number and consider the vector z = 
x + c(Ax, y)y.] 

5. Let /a, denote the linear space consisting of all bounded sequences x = 
{xi, X2, . . . , x n , . . . } of scalars with norm llxll = sup |x n |. Show that Zoo is a 
Banach space. 

12.4 CLOSED SUBSPACES 

Since the norm on a vector space induces a metric topology on the space (i.e., 
defines the open sets in terms of the induced metric), it makes sense to define 
a closed subspace as a sub space which is a closed set relative to the metric 
topology. In view of Theorem B14, we say that a set A of vectors is closed if 
every convergent sequence of vectors in A converges to a vector in A. If E is a 
vector space, many authors define a linear manifold to be a subset S C E of 
vectors such that S is also a linear space. In this case, a subspace is defined to 
be a closed linear manifold. From the corollary to Theorem 12.9, we then see 
that any finite-dimensional linear manifold over either C or R is a subspace. 
We mention this terminology only in passing, and will generally continue to 
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use the word "subspace" in our previous context (i.e., as a linear manifold). 
As a simple example, let V = R be a vector space over the field Q. Then the 
subspace W C V defined by W = Q is not closed (why?). 

Recall from Theorem 2.22 that if W is a subspace of a finite-dimensional 
inner product space V, then V = W © W x . We now wish to prove that if M is 
a closed subspace of a Hilbert space H, then H = M © M 1 . Unfortunately, this 
requires that we prove several preliminary results along the way. We begin 
with a brief discussion of convex sets. 

We say that a subset S of a vector space V is convex if for every pair 
x, y G S and any real number t G [0, 1], the vector 

z = (1 -t)x + ty 

is also an element of S. Intuitively, this is just says that the straight line 
segment from x to y in V is in fact contained in S. It should be obvious that 
the intersection of any collection of convex sets is convex, and that every sub- 
space of V is necessarily convex. 

It follows by induction that if S is convex and x l5 . . . , x n G S, then the 
vector t,x, + • • • + t n x n where < t s < 1 and t x + • • • + t n = 1 is also in S. 
Conversely, the set of all such linear combinations forms a convex set. It is 
trivial to verify that if S is convex, then so is any translate 

S + z = {x + z: z G V is fixed and x G S} . 

Moreover, if X: V -*■ W is a linear map and S C V is convex, then X(S) is a 
convex subset of W, and if T C W is convex, then Af'(T) is convex in V. We 
leave the proofs of these elementary facts to the reader (see Exercise 12.4.1). 

The main result dealing with convex sets that we shall need is given in the 
next theorem. 

Theorem 12.14 Every nonempty closed convex subset S of a Hilbert space 
H contains a unique vector of smallest norm. In other words, there exists a 
unique x G S with the property that llx ll < llxll for every x G S. 

Proof Let 6 = inf{llxll: x G S}. By definition of inf, this implies the existence 
of a sequence {x n } of vectors in S such that llxjl -» 6. Since S is convex, 
(x n + x m )/2 is also in S (take t = 1/2 in the definition of convex set), and hence 
ll(x n + x m )/2ll > 6 or llx n + x m ll > 26. Applying the parallelogram law we see 
that 
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\\x n -xj = 2\\xf + 2\\xJ - Wx n+ xJ 2 
< 2\\xf + 2\\xJ 2 - 4<5 2 . 

Taking the limit of the right hand side of this equation as m, n -*■ <*> shows 
that llx n - x m ll -*■ 0, and hence {x n } is a Cauchy sequence in S. By Theorem 
12.9, S is complete, and thus there exists a vector x G S such that x n — > x. 
Since the norm is a continuous function, we see that (see Examples 12.2 and 
12.3 or Theorem 12.7(d)) 

llxll = Mini x n ll = lim llx n ll = 6 . 

Thus x G S is a vector with smallest norm 6 = inf {llxll : x G S}. 

To show that this x is unique, suppose that y G S is such that llyll = 6. 
Applying the parallelogram law again to the vectors x/2 and y/2 yields 

llx - yll 2 /4 = llxll 2 /2 + llyll 2 /2 - ll(x + y)/2\\f . 

But (x + y)/2 G S implies ll(x + y)/2ll > 6, and thus we have 

llx - yll 2 < 2llxll 2 + 2llyll 2 - 46 2 . 

If llxll = llyll = 6, then this equation implies that x = y. I 

The notion of orthogonality is extremely important in the theory of Hilbert 
spaces. We recall from Section 12.1 that two vectors x and y in a Hilbert 
space H are said to be orthogonal if (x, y) = 0, and we write this as x _L y. If S 
is a (nonempty) subset of H and x G H is orthogonal to every y G S, then we 
express this by writing x 1 S. Thus the orthogonal complement S x of S is 
defined by S x = {x G H: x 1 S}. 

As an example, we consider the orthogonal complement x x of any x G H. 
If x _L y and xlz, then x 1 (y + z) and x _L (ay) for any scalar a. Therefore 
x x is actually a subspace of H. If we define a continuous linear map f x : H — > 
C by f x (y) = (x, y), then x x = {y G H: f x (y) = 0}. In fact, if {y n } is a sequence 
in x x that converges to an element y G H, then the continuity of the inner 
product yields 

(x, y) = (x, lim y n > = lim (x, y n > = 

and hence y G x x also. This proves that x x is in fact a closed subspace of H. 
Carrying this idea a little farther, if S is a subset of H, then we can clearly 
write S x = n x( =s x x . Since this shows that S x is the intersection of closed 
subspaces, it follows that S x must also be a closed subspace (see Exercise 
12.4.2). Alternatively, if y G S and {x n } C S x with x n -* x, then we again 
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have (x, y) = lim (x n , y) = so that x G S x and S x is therefore closed. We 
leave it to the reader to prove the following (see Exercise 12.4.3): 

X = H and H x = 0. 

s n s x c {0}. 

s c s xx = (S x ) x . 

S, C S 2 implies S 2 X C S x . 

Furthermore, using the next theorem, it is not hard to show that a subset M of 
a Hilbert space H is closed if and only if M xx = M (see Exercise 12.4.6). 

Theorem 12.15 Let M be a proper closed subspace of a Hilbert space H (i.e., 
M ^ H). Then there exists a nonzero vector x G H such that x _L M. 

Proof Suppose x G H and x ^ M. Since any subspace is automatically 
convex, it follows that the set x - M = {x - y: y G M} is closed and convex. 
By Theorem 12.14, this set contains a unique vector x = x - y G x - M of 
smallest norm. By definition, this means that llx - y ll = inf{llx - yll: y G M}. If 
we had llx - y ll = 0, then x would be an accumulation point of M, contra- 
dicting the assumption that M is closed and x ^ M. Thus we must have x ^ 0, 
and we claim that x _L M. 

Since x is of smallest norm, we see that for any y G M and any a G C we 
have 

llx ll 2 < llx + ayll 2 . 
Expanding this out in terms of the inner product on H we find that 

< 2 Re{a(x , y)} + |a| 2 llyll 2 . 

In particular, if we let a = c(y, x ) where cGRis nonzero, then this equation 
becomes 

< c|(x , y)| 2 (2 + cllyll 2 ) . 

If y G M is such that (x , y) ^ 0, then the fact that this equation holds for all 
nonzero c G R leads to a contradiction if we choose c such that -2/llyll 2 < c < 
0. It therefore follows that we must have (x , y) = for every y G M, and 
hence x _L M. I 

We are now in a position to prove our earlier assertion. After the proof we 
shall give some background as to why this result is important. 



12.4 CLOSED SUBSPACES 



653 



Theorem 12.16 Let M be a closed subspace of a Hilbert space H. Then H = 
M © M x . 

Proof We first show that M + M x is a closed subspace of H. To see this, we 
note that M D M x = {0} (since M is a subspace and hence contains the zero 
vector), and every z G M + M x may be written in the unique form z = x + y 
with x G M and y G M x . (See the proof of Theorem 2.12. Note also that we 
have not yet shown that every z G H is of this form.) Now let {z n } = 
{x n + yn} be any sequence in M + M x that converges to an element z G H. 
We must show that z G M + M x . Using the Pythagorean theorem we see that 

Wz m -z n \\ 2 = \\(x m +y m )-(x n +y n )\\ 2 
= Kx m -x n ) + (y m -y n )\\ 2 
= l*« ~ X J 2 + h m ~yJ 2 

and therefore {z n } is a Cauchy sequence in M + M x if and only if {x n } is a 
Cauchy sequence in M and {y n } is a Cauchy sequence in M x . Since both M 
and M x are closed they are complete (Theorem 12.9). Therefore, since {z n } is 
a convergent sequence in H it is a Cauchy sequence in H, and in fact it is a 
Cauchy sequence in M + M x since every z n = x n + y n G M + M x . But then 
{x n } and {y n } are Cauchy sequences in M and M x which must converge to 
points x G M and y G M x . Hence 

z = lim z n = lim(x n + y n ) = lim x n + lim y n = x + yGM + M x . 

This shows that M + M x is a closed subspace of H. 

We now claim that H = M + M x . Since we already know that M D M x = 
{0}, this will complete the proof that H = M © M x . If H * M + M x , then 
according to Theorem 12.15 there exists a nonzero z G H with the property 
that z 1 (M + M x ). But this implies that z G M x and z G M xx , and hence 

llz ll 2 = (z , z ) = (or observe that zo G M x D M xx = {0}) which contradicts 
the assumption that z ^ 0. I 

To gain a little insight as to why this result is important, we recall our dis- 
cussion of projections in Section 7.8. In particular, Theorem 7.27 shows that a 
linear transformation E on a finite-dimensional vector space V is idempotent 

(i.e., E 2 = E) if and only if V = U © W where E is the projection of V on U = 
Im E in the direction of W = Ker E. In order to generalize this result to 
Banach spaces, we define an operator on a Banach space B to be an element 
of L(B, B). In other words, an operator on B is a continuous linear trans- 
formation of B into itself. A projection on B is an idempotent operator on B. 
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Thus, in order that an operator P on B be idempotent, it must obey both the 
algebraic requirement that P 2 = P as well as the topological condition of 
continuity. The generalization of Theorem 7.27 to Banach spaces is given by 
the next two theorems, the proofs of which are left to the reader since we will 
not be needing them again. 

Theorem 12.17 Let P be a projection on a Banach space B and let M = Im P 
and N = Ker P. Then M and N are closed subspaces of B, and B = M © N. 

Proof See Exercise 12.4.5. I 

Theorem 12.18 Let B be a Banach space and let M, N be closed subspaces 
of B such that B = M © N. Then for any z = x + y G M © N, the mapping P 
defined by P(z) = x is a projection on B with Im P = M and Ker P = N. 

Proof The only difficult part of this theorem is the proof that P is continuous. 
While this may be proved using only what has been covered in this book 
(including the appendices), it is quite involved since it requires proving both 
Baire's theorem and the open mapping theorem. Since these are essentially 
purely topological results whose proofs are of no benefit to us at this point, we 
choose to refer the interested reader to, e.g., the very readable treatment by 
Simmons (1963). I 

As mentioned in Section 7.8, if we are given a space V and subspace U, 
there may be many subspaces W with the property that V = U © W. Thus, if 
we are given a closed subspace M of a Banach space B, then there could be 
many algebraic projections defined on B with image M, and in fact none of 
them may be projections as defined above (i.e., they may not be continuous). 
In other words, there may not exist any closed subspace N such that B = M © 
N. However, Theorem 12.16 together with Theorem 12.18 shows that if we 
have a Hilbert space H together with a closed subspace M, then there always 
exists a projection P defined on H = M © M x with Im P = M and Ker P = M x . 

Exercises 

1. Let V and W be vector spaces, and let S C V be convex. 

(a) Show that the intersection of any collection of convex sets is convex. 

(b) Show that every subspace of V is convex. 
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(c) Show that any translate S + z = {x + z: z G V is fixed and x G S} is 
convex. 

(d) If X: V — > W is linear, show that X(S) is a convex subset of W, and if 
T C W is convex, then X, _1 (T) is convex in V. 

2. Let H be a Hilbert space and S a nonempty subset of H. Show that S x = 
f\es x x is a closed subspace of H. 

3. Let H be a Hilbert space and S a nonempty subset of H. Prove the 
following: 

(a) X = H and H x = 0. 

(b) S n S x C {0}. 

(c) SCS XX . 

(d) S, C S 2 implies S 2 X C S x . 

4. Show that a subset M of a Hilbert space H is closed if and only if M xx = 
M. 

5. Prove Theorem 12.17. 



12.5 HILBERT BASES 

Let us now turn our attention to the infinite-dimensional analogue of the 
expansion of a vector in terms of a basis. (We recommend that the reader first 
review Sections 0.3 and 0.4 before continuing on with this material.) Suppose 
that x is a vector in a Hilbert space H such that llxll ^ 0, and let y G H be arbi- 
trary. We claim there exists a unique scalar c such that y - cx is orthogonal to 
x. Indeed, if (y - cx) _L x, then 

= (x, y-cx) = (x, y)-c(x, x) 

implies that 

c = (x, y)/(x, x) 

while if c = (x, y)/(x, x), then reversing the argument shows that (y - cx) J. x. 
The scalar c is usually called the Fourier coefficient of y with respect to (or 
relative to) x. 

To extend this idea to finite sets of vectors, let {xj = {x b . . . , x n } be a 
collection of vectors in H. Furthermore assume that the x ; are mutually 
orthogonal, i.e., (x s , x,} = if i # j. If Cj = (x ; , y)/(Xj, x s ) is the Fourier coeffi- 
cient of y G H with respect to x ; , then 
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7-1 7=1 
= 

which shows that y - Hf=\ CjXj is orthogonal to each of the x ; . Geometrically, 
this result says that if we subtract off the components of a vector y in the 
direction of n orthogonal vectors x ; , then the resulting vector is orthogonal to 
each of the vectors x s . 

We can easily simplify many of our calculations be requiring that our 
finite set {x s } be orthonormal instead of just orthogonal. In other words, we 
assume that (x ; , X,) = Sy, which is equivalent to requiring that i ^ j implies that 
x; _L Xj and llxj = 1. Note that given any x ; G H with llx ; ll^ 0, we can normalize 
Xj by forming the vector e s = Xi/llxjll. It is then easy to see that the above cal- 
culations remain unchanged except that now we simply have c s = (x ; , y). We 
will usually denote such an orthonormal set by {ej, and hence we write 

(e;, ej) = . 

Suppose {ej} is a finite orthonormal set in a Hilbert space H and x is any 
element of H. We claim that the expression 

n 
fe=l 

achieves its minimum value in the case where each of the scalars a k is equal to 
the Fourier coefficient c k = (e k , x). To see this, we note that the above discus- 
sion showed that x - 2£=i c k e k is orthogonal to each e ; for i = 1, . . . , n and 
hence we may apply the Pythagorean theorem to obtain 

n n n 

^x-^a k e k \\ 2 = \\x-^c k e k + ^(c k -a k )e k \\ 2 
*=i *=i *=i 

= \\ x -2c k e k \\ 2 + \\2(c k -a k )e k \\ 2 . 
*=i *=i 

It is clear that the right hand side of this equation takes its minimum value at 
a k = c k for k = 1, . . . , n and hence we see that in general 

n n 

\\x-^c k e k \\ < \\x-^a k e k \\ 
k=i *=i 
for any set of scalars a k . Moreover, we see that (using c k = (e k , x» 
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< \\x-^c k e k \\ 2 
k=\ 

n n 

= (x-^c h e k , x-^c r e r ) 

k=l r=\ 

n n n 

= \\x\\ 2 -^c k *{e k , x)-^c r {x, e r ) + ^\c k \ 2 

k=\ r=l k=l 

= Hxii 2 -2k,l 2 

which implies 

2\c k \ 2 =2\(^4 2 ^Wx\\ 2 . 

k=i k=i 

This relationship is frequently called Bessel's inequality, although this des- 
ignation also applies to the infinite-dimensional version to be proved below. 

We now seek to generalize these last two results to the case of arbitrary 
(i.e., possibly uncountable) orthonormal sets. We begin with a simple 
theorem. 

Theorem 12.19 Let {ej, i G I (where I may be uncountable) be an arbitrary 
orthonormal set in a Hilbert space H. Then if x is any vector in H, the set S = 
{eji (e ; , x) ^ 0} is countable (but possibly empty). 

Proof For each n£Z + define the set 

S n = {e ; : |(e i5 x)| 2 > llxll 2 /n} . 

We claim that each S n can contain at most n - 1 vectors. To see this, suppose 
S n contains N vectors, i.e., S n = {e l5 . . . , eN}- Then from the definition of S n 
we have 

N 

^ I | 2 >(\\x\\ 2 /n)N 
while Bessel's inequality shows that 

2iK,*>i 2 < . 

Thus we must have N < n which is the same as requiring that N < n - 1. The 
theorem now follows if we note that each S n consists of a finite number of 
vectors, and that S = U"=iS n since S n -* S as n -*■ oo. I 
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Theorem 12.19 now allows us to prove the general (i.e., infinite- 
dimensional) version of Bessel's inequality. Keep in mind that an arbitrary 
orthonormal set may consist of an uncountable number of elements, and in 
this case we do not write any limits in the sum Site, x}| 2 . 

Theorem 12.20 If {ej is any orthonormal set in a Hilbert space H and x is 
any vector in H, then 2l(e ; , x)| 2 < llxll 2 . 

Proof First note that if e a G {ej is such that (e a , x) = 0, then this e a will not 
contribute to 2l(ej, x)| 2 . As in Theorem 12.19, we again consider the set 

S = {e,: (e„x}*0} . 

If S = 0, then we have 2l(ej, x}| 2 = and the conclusion is obviously true. If S 
^ 0, then according to Theorem 12.19 it must contain a countable number of 
vectors. If S is in fact finite, then we write S = {e b . . . , e n } and the theorem 
follows from the finite-dimensional Bessel inequality proved above. Thus we 
need only consider the case where S is countably infinite. 

We may consider the vectors in S to be arranged in any arbitrary (but now 
fixed) order {e b e 2 , . . . , e n , . . . }. From the corollary to Theorem B20 we 

know that if 2f=il(ej, x)| 2 converges, then this sum is independent of any 
rearrangement of the terms in the series. This then gives an unambiguous 
meaning to the expression 2l(ej, x)| 2 = 2r=il(ei, x)| 2 . Therefore we see that the 
sum is a nonnegative (extended) real number that depends only on the set S 
and not on the order in which the vectors in S are written. If we let 

be the nth partial sum of the series, then the finite-dimensional version of 
Bessel's inequality shows that s n < llxll 2 for every n, and hence we must have 

00 

2l(^,x>| 2 < llxll 2 . I 
i=l 

Theorem 12.21 Let {ej be an orthonormal set in a Hilbert space H, and let x 
be any vector in H. Then (x - 2(ej, x)ej) 1 ej for each j. 

Proof Just as we did in the proof of Theorem 12.20, we must first make 
precise the meaning of the expression 2(ej, x)ej. Therefore we again define the 
set S = {eji (e h x) ^ 0}. If S = 0, then we have 2(ej, x)e s = so that our 
theorem is obviously true since the definition of S then means that x J. ej for 
every j. If S is finite but nonempty, then the theorem reduces to the finite case 
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proved in the discussion prior to Theorem 12.19. Thus, by Theorem 12.19, we 
are again left with the case where S is countably infinite. We first prove the 
result for a particular ordering S = {e 1; e 2 , . . . }, and afterwards we will show 
that our result is independent of the ordering chosen. 

Let s n = 2, n =i(ei, x)e ; . Since the Pythagorean theorem may be applied to 
any finite collection of orthogonal vectors, the fact that {e^ is an orthonormal 
set allows us to write (for m > n) 

m m 

h m -sf= II J [e i ,x)e i f= J \{e t ,x)\ 2 . 

i=n+l i=n+l 

Now, Bessel's inequality shows that 2f=il(ej, x}| 2 must converge, and hence 

for any e > there exists N such that m > n > N implies 2?= n+ i |(ei, x}| 2 < e 2 
(this is just Theorem B17). This shows that {s n } is a Cauchy sequence in H, 
and thus the fact that H is complete implies that s n — > s = 2f=i(ej, x)ej G H. If 
we define 2(ej, x)e s = 2f=i(ej, x)e s = s, then the continuity of the inner product 
yields 

{ej, x-s) = {ej, x) - {ej, s) = {ej, x) - (e } , limsj 

n 

= {ej, x) - \im{ej, s n ) = (ej, x) - lim^ (e t , x)(ej, e t ) 
= (e j; x)-(ej, x) = . 

Thus we have shown that (x - s) ± ej for every j. 

We now show that this result is independent of the particular order chosen 
for the {ej} in the definition of s. Our proof of this fact is similar to the proof 
of the corollary to Theorem B20. Let {e'i} be any other arrangement of the set 
{ej}, and let s' n = 2i n =i(e'i, x) e' ; . Repeating the above argument shows that s' n 
converges to a limit s' = 2i°=i(e'i, x)eV We must show that s' = s. Since {s n }, 

{s' n } and 2f=il(ei, x)| 2 all converge, we see that for any s > 0, there exists N > 
such that n > N implies 

lls n - sll < s 

Ils'n - S'll < 8 

and 

2r= N+ il(e„x)| 2 < e 2 

(this last inequality follows by letting m — > o° in Theorem B17). We now note 
that since there are only a finite number of terms in sn and {e'i} is just a 
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rearrangement of {e^, there must exist an integer M > N such that every term 
in sn also occurs in s'm • Then s'm - Sn contains a finite number of terms, 
each of which is of the form (e s , x)ej for i = N+ l,N + 2, ... . We therefore 
have 

oo 

i=N+l 

and hence IIs'm - snII < £• Putting all of this together, we have 

lls'-sll < lis' - s'mII + IIs'm - s N H + Hsn - sll < 3e 
and hence s' = s. I 

At last we are in a position to describe the infinite-dimensional analogue 
of the expansion of a vector in terms of a basis. Let H be a nonzero Hilbert 
space, and let {Xj} be an arbitrary collection of vectors in H such that kill ^ 
for each i. (We required H to be nonzero so that such a collection will exist.) 
For each finite subcollection {x;, , . . . , x; n } of {xj, we can form the vector 
space spanned by this subcollection of vectors. In other words, we can con- 
sider the space consisting of all linear combinations of the form ciXjj + • • • + 
c n x; n where each q is a complex number. In order to simplify our notation, we 
will generally omit the double indices and write simply {x, , . . . , x n }. 

Now consider the union of all vector spaces generated by such finite sub- 
collections of {xj. This union is clearly a vector space itself, and is called the 
subspace generated by the collection {Xj}. Let us denote this space by E. We 
say that the collection {xj is total in H if E is dense in H (i.e., CI E = H). In 
other words, {x ; } is total in H if every vector in H is the limit of a sequence of 
vectors in E (see Theorem B 14(b)). A total orthonormal set {ej} is called a 
Hilbert basis (or an orthonormal basis). Be careful to note however, that 
this is not the same as a basis in a finite-dimensional space. This is because 
not every vector in H can be written as a linear combination of & finite number 
of elements in a Hilbert basis. 

An equivalent way of formulating this property that is frequently used is 
the following. Consider the family of all orthonormal subsets of a nonzero 
Hilbert space H. We can order this family by ordinary set inclusion, and the 
result is clearly a partially (but not totally) ordered set. In other words, if S, 
and S 2 are orthonormal sets, we say that S, < S 2 if S, C S 2 . We say that an 
orthonormal set {ej} is complete if it is maximal in this partially ordered set. 
This means that there is no nonzero vector x G H such that if we adjoin e = 
x/llxll to {ej}, the resulting set {ej, e} is also orthonormal and contains {ej} as a 
proper subset. We now show the equivalence of this approach to the previous 
paragraph. 
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Let {ej} be a complete orthonormal set in a Hilbert space H, and let E be 
the subspace generated by {ej}. If CI E ^ H, then by Theorem 12.15 there 
exists a nonzero vector x G H such that x _L CI E. In particular, this means that 
xlE and hence the set {e ; , e = x/llxll} would be a larger orthonormal set than 
{ej}, contradicting the maximality of {ej}. 

Conversely, suppose that {ej} is a Hilbert basis for H (i.e., a total ortho- 
normal set). If {ei} is not complete, then there exists a nonzero vector x G H 
such that {ej, e = x/llxll} is an orthonormal set that contains {ej} as a proper 
subset. Then e _L {ej}, and hence the subspace E generated by {e ; } must be a 
subset of e x . Since e x is closed, it follows that CI E C e x . But then e 1 CI E 
which contradicts the assumption that CI E = H. 

Theorem 12.22 Every nonzero Hilbert space H contains a complete ortho- 
normal set. Alternatively, every such H has a Hilbert basis. 

Proof Note that every chain of orthonormal sets in H has an upper bound 
given by the union of the sets in the chain. By Zorn's lemma, the set of all 
orthonormal sets thus has a maximal element. This shows that H contains a 
complete orthonormal set. That H has an orthonormal basis then follows from 
the above discussion on the equivalence of a complete orthonormal set and a 
Hilbert basis. I 

Some of the most important basic properties of Hilbert spaces are con- 
tained in our next theorem. 

Theorem 12.23 Let {ej} be an orthonormal set in a Hilbert space H. Then 
the following conditions are equivalent: 

(1) {ej} is complete. 

(2) x J. {ei} implies x = 0. 

(3) For any x G H we have x = 2(e s , x)e ; . 

(4) For any x G H we have llxll 2 = 2l(e„ x)| 2 . 

Proof (1) => (2): If (2) were not true, then there would exist a nonzero vector 
e = x/llxll G H such that e _L {ej}, and hence {e s , e} would be an orthonormal 
set larger than {ej}, contradicting the completeness of {e s }. 

(2) => (3): By Theorem 12.21, the vector y = x - 2(ej, x)ej is orthogonal 
to {e ; }, and hence (2) implies that y = 0. 

(3) => (4): Using the joint continuity of the inner product (so that the sum 
as a limit of partial sums can be taken outside the inner product), we simply 
calculate 
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llxll 2 = (x, x) = x)e i , x)ej) 

= 2l(e ; ,x)| 2 . 

(4) => (1): If {ei} is not complete, then there exists e G H such that {e^ e} 
is a larger orthonormal set in H. Since this means that e 1 {ej}, statement (4) 

yields Hell 2 = 2l(e ; , e)| 2 = which contradicts the assumption that Hell = 1.1 

Note that the equivalence of (1) and (3) in this theorem is really just our 
earlier statement that an orthonormal set is complete if and only if it is a 
Hilbert basis. We also remark that statement (4) is sometimes called 
Parseval's equation, although this designation also applies to the more gen- 
eral result 

(x, y) = 2(x, eiXe;, y) 

(see Exercise 12.5.1). 

It should be emphasized that we have so far considered the general case 
where an arbitrary Hilbert space H has a possibly uncountable orthonormal 
set. However, if H happens to be separable (i.e., H contains a countable dense 
subset), then we can show that every orthonormal set in H is in fact countable. 

Theorem 12.24 Every orthonormal set {ej} in a separable Hilbert space H 
contains at most a countable number of elements. 

Proof We first note that by the Pythagorean theorem we have 

lie, - e/ = He, II 2 + He/ = 2 

and hence lb; - ejll = V2 for every i ^ j. If we consider the set {B(ej, 1/2)} of 
open balls of radius 1/2, then the fact that 2(1/2) = 1 < V2 implies that these 
balls are pairwise disjoint. Now let {x n } be a countable dense subset of H. 
This means that any neighborhood of any element of H must contain at least 
one of the x n . In particular, each of the open balls B(e;, 1/2) must contain at 
least one of the x n , and hence there can be only a countable number of such 
balls (since distinct balls are disjoint). Therefore the set {ej} must in fact be 
countable. I 

It is worth remarking that if we are given any countable set of linearly 
independent vectors {xj in a Hilbert space H, then the Gram-Schmidt proce- 
dure (see the corollary to Theorem 2.21) may be applied to yield a countable 
orthonormal set {ej} such that for any n, the space spanned by {e x , . . . , e n } is 
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the same as the space spanned by {X[ , . . . , x n }. It then follows that {ej is 
complete if and only if {x ; } is complete. 

Finally, suppose that we have a countable (but not necessarily complete) 
orthonormal set {ej in a Hilbert space H. From Bessel's inequality, it follows 
that a necessary condition for a set of scalars c b c 2 , . . . to be the Fourier coef- 
ficients of some x G H is that 2k°=ilc k | 2 < llxll 2 . In other words, the series 
2k°=i|c k | 2 must converge. That this is also a sufficient condition is the content 
of our next result, which is a special case of the famous Riesz-Fischer 
theorem. 

Theorem 12.25 (Riesz-Fischer) Let {ej} be an orthonormal set in a Hilbert 
space H, and let {Cj} be a collection of scalars such that the series 2k°=il c kl 2 
converges. Then there exists a vector x G H with {c ; } as its Fourier coeffi- 
cients. In other words, 2k°=i|c k | 2 = llxll 2 where c k = (e k , x). 

Proof For each n, define the vector 

n 

k=l 

and note that c k = (e k , x n ) for k < n. Since 2k°=i|c k | 2 converges, it follows from 
Theorem B17 that for each e > 0, there exists N such that n > m > N implies 

2 M 2 <* • 

k=m+l 

Using the Pythagorean theorem, we then see that n > m > N implies 

n n n 

k-*j 2 = ii 2 c k e k e= 2 llc ^ 112 = 2 m 2<£ 

k=m+l k=m+l k=m+l 

and hence {x n } is a Cauchy sequence in H. Since H is complete, there exists a 
vector x G H such that llx n - xll -* 0. In addition, we note that we may write 

{e k ,x)~{e k ,x n ) + {e k ,x-x n ) (6) 

where the first term on the right hand side is just c k . From the Cauchy- 
Schwartz inequality we see that 

|(e k , x - x n )| < llej llx - x n ll = llx - x n ll 

and thus letting n -*■ o° shows that (e k , x - x n ) -* 0. Since the left hand side of 
(6) is independent of n, we then see that 
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(e k , x) = c k . 

Using this result, we now let n -* <*> to obtain (since llx - x n ll -* 0) 

n n 

Wx-xJ 2 =(x-^c k e k , x-^c k e k ) 
k=i k=i 

= \\x\\ 2 -2\c k \ 2 - . 

k=\ 

In other words, we have 

n oo 

lim n _2l c *| 2 = Il c *| 2 = 11x1,2 • 1 
k=i k=i 

Exercises 

1. If {e,} is a complete orthonormal set in a Hilbert space H and x, y G H, 
prove that (x, y) = 2 ; (x, e^e;, y). 

2. Let e n denote the sequence with a 1 in the nth position and O's elsewhere. 
Show that {ei, e2, . . . , e n , . . . } is a complete orthonormal set in fa. 

3. Prove that a Hilbert space H is separable if and only if every orthonormal 
set in H is countable. 

4. (a) Show that an orthonormal set in a Hilbert space is linearly indepen- 
dent. 

(b) Show that a Hilbert space is finite-dimensional if and only if every 
complete orthonormal set is a basis. 

5. Let S be a nonempty set, and let ^(S) denote the set of all complex-valued 
functions f defined on S with the property that: 

(i) {s G S: f(s) * 0} is countable (but possibly empty). 

(ii) 2|f(s)| 2 <*o. 

It should be clear that /2(S) forms a complex vector space with respect to 
pointwise addition and scalar multiplication. 
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(a) Show that /2(S) becomes a Hilbert space if we define the norm and 

inner product by llfll = (2|f(s)| 2 ) 1/2 and (f, g) = 2f(s)*g(s). 

(b) Show that the subset of /2(S) consisting of functions that have the 
value 1 at a single point and elsewhere forms a complete orthonormal 
set. 

(c) Now let S = {e;} be a complete orthonormal set in a Hilbert space H. 
Each x G H defines a function f on S by f(e;) = (e;, x). Show that f is in 
fe(S). 

(d) Show that the mapping x >-» f is an isometric (i.e., norm preserving) 
isomorphism of H onto /2(S). 

12.6 BOUNDED OPERATORS ON A HILBERT SPACE 

One of the most important concepts in quantum theory is that of self-adjoint 
operators on a Hilbert space. We now begin a discussion on the existence of 
operator adjoints. While the existence of the adjoint in a finite-dimensional 
space was easy enough to prove, the infinite-dimensional analogue requires 
slightly more care. Therefore, our first goal is to prove one version of a 
famous result known as the Riesz representation theorem, which is the Hilbert 
space analogue of Theorem 10.1. 

As usual, we let E* denote the dual space (which is also frequently called 
the conjugate space) to the Banach space E. In other words, E* is just the 
space L(E, C) of continuous linear maps of E into C. Elements of E* are 
called functionals, and it is important to remember that this designation 
implies that the map is continuous (and hence bounded). If f G E*, we may 
define the norm of f as usual by 

llfll = sup{|f(x)|: 11x11= 1} . 

Since C is clearly a Banach space, it follows from Theorem 12.13 that E* is 
also a Banach space (even if E is not). 

If y is any (fixed) vector in a Hilbert space H, then we define the function 
f y : H -> C by f y : x >-» (y, x). Since the inner product is continuous, it follows 
that f y is continuous. Furthermore, we note that for any x l5 x 2 G H and a G C 
we have 

f y (Xj + x 2 ) = (y, x, + x 2 ) = (y, x,} + (y, x 2 > = f y (x,) + f y (x 2 ) 

and 

f y (ax,) = (y, ax,) = a(y, x,) = af y (xO 
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and hence f y is linear. This shows that f y £ H* = L(H, C), and therefore we 
may define 

llf y ll = sup{|f y (x)|: llxll = 1} . 

Using Example 12.1, we see that |(y, x>| < llyll llxll, and thus (by definition 
of sup) 

llf y ll = sup{|(y, x)|: llxll = 1} < llyll . 

On the other hand, we see that y = implies llfyll = = llyll, while if y ^ then 
(again by the definition of sup) 

llfyll = sup{|f y (x)|: llxll = 1} > |f y (y/llyll)| = |(y, y/llyll}| = llyll . 

We thus see that in fact llfyll = llyll, and hence the map y >-» f y preserves the 
norm. 

However, the mapping y >-» f y is not linear. While it is true that 

fy, + y 2 (x) = (y,+y 2 , X) = (fyj+fy 2 Xx) 

and hence f yi + yz = f yi + f yz , we also have 

f ay (x) = (ay, x) = a*(y,x> = a*f y (x) 

so that f ay = a*f y . This shows that the map y i — * f y is really a norm preserving 
antilinear mapping of H into H*. We also note that 

llf yi -f y2 ll= llf yi - y2 ll = lly,-y 2 ll 

which shows that the map y >-» f y is an isometry. 

What we have shown so far is that given any y G H, there exists a linear 
functional f y G H* where the association y >-» f y = (y, } is a norm preserving 
antilinear map. It is of great importance that this mapping is actually an iso- 
morphism of H onto H*. In other words, any element of H* may be written in 
the form f y = (y, ) for a unique y G H. We now prove this result, which is a 
somewhat restricted form of the Riesz representation theorem. 

Theorem 12.26 (Riesz Representation Theorem) Let H be a Hilbert space. 
Then given any f G H*, there exists a unique y G H such that 

f(x) = (y,x) (7) 

for every x G H. Moreover we have llyll = llfll. 
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Proof Assuming the existence of such a y G H, it is easy to show that it must 
be unique. To see this, we simply note that if f(x) = (y u x) and f(x) = (y 2 , x) 
for every x G H, then = (y l5 x) - (y 2 , x) implies = (y 1 - y 2 , x). But this 
holds for all x G H, and hence we must have y, - y 2 = which implies y, = y 2 . 
(If (y, x) = for all x G H, then in particular, = (y, y) = llyll 2 implies y = 0.) 

We now prove that such a y does indeed exist. First note that if f = 0, then 
we may choose y = to satisfy the theorem. Therefore we now assume that 
f ^ 0. Let M = Ker f. We know that Ker f is a subspace of H, and since f ^ 
we must have M ^ H. Furthermore, if {x n } is a sequence in M that converges 
to some x G H, then the continuity of f shows that 

f(x) = f(limx n ) = limf(x n ) = 

and hence x G M. This shows that M is a proper closed subspace of H (by 
Theorem B 14(a)). By Theorem 12.15, there exists a nonzero y G H such that 
y !M. We claim that y = ay will satisfy our requirements for a suitably 
chosen scalar a. 

First note that for any scalar a and any x G M, we have f(x) = on the one 
hand, while (ocy , x) = a*(y , x) = on the other (since y -L x). This shows 
that (7) is true for every x G M no matter how we choose a. However, if we 
now require that (7) hold for the vector x = y (where yo ^ M by definition), 
then we must also have 

f(y ) = (ay , y > = a*lly ll 2 

which leads us to choose a = f(y )*/ lly H 2 - With this choice of a, we have then 
shown that (7) holds for all x G M and for the vector x = y . We now show 
that in fact (7) holds for every x G H. 

We observe that any x G H may be written as 

x = x - [f(x)/f(y )]y + [f(x)/f(y )]y 

where x - [f(x)/f(y )]y G M. In other words, any x G H may be written in the 
form x = m + (3y where m G M and (3 = f(x)/f(y ). Since f is linear, we now 
see that our previously shown special cases result in (setting y = ay ) 

f{x) = f(m + /3y ) = f{m) + Pf(y ) = (y, m) + /3(y, y > 
= (y, m + /3y ) = (y, x) 

and hence f(x) = (y, x> for every x G H. 
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Finally, the fact that llyll = llfll was shown in the discussion prior to the 
theorem. I 

If H is a Hilbert space, we define an inner product on H* by 

(fy,fx> = (X,y> . 

Note the order of the vectors x and y in this definition. This is to ensure that 
the inner product on H* has the correct linearity properties. In other words, 
using the fact that the mapping y * f y is antilinear, we have 

(af y , fx) = (f a * y , fx) = (x, a*y) = a*(x,y) = a*(f y , f x ) 

and 

(f y , ctfx) = (fy, f«*x) = (ct*x, y) = a(x, y) = a(f y , f x ) . 
Using f yi + f y2 = fy, + y2 it is trivial to verify that 

(fy, + fy 2 , f x ) = (fy„ f x ) + (fy 2 , f x ) . 

This inner product induces a norm on H* in the usual way. 

We claim that H* is also a Hilbert space. To see this, let {f Xi } be a Cauchy 
sequence in H*. Given e > 0, there exists N such that m > n > N implies that 
llf Xm - f II < e . Then, since every f Xn corresponds to a unique x n (the kernel of 

the mapping x — > f x is {0}), it should be obvious that {x n } will be a Cauchy 
sequence in H. However, we can also show this directly as follows: 

\\f x -f x II 2 = (f x -f x ,f x -f x ) 

m J x n x m J x n J x m J x n 

ifx m ' fxj~(fx m > fx n )~(fx n ' fxj + (fx n ' fx) 

= {x m , x m ) - {x m , x n ) - {x n , x m ) + (x n , X n ) 

= ( X m - X n ' X m) ~ ( X m ~ X n ' X n) 
~ ( X m ~ X n' X m~ X n) 

= Wx m -xJ 2 . 

This shows that {x n } is a Cauchy sequence in H, and hence x n -* x G H. But 
then f Xn -*■ f x G H* which shows that H* is complete, and hence H* is also a 
Hilbert space. 
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Example 12.10 Recalling the Banach space /p 1 defined in Example 12.7, we 
shall show that if 1 < p < <x> and 1/p + 1/q = 1, then (If)* = If By this equality 
sign, we mean there exists a norm preserving isomorphism of /JJ onto (If)*. If 
{ej} is the standard basis for R n , then any x = (x, , . . . , x n ) G Zp 1 may be 
written in the form 

n 

*- 2! • 

Now let f be a linear mapping of /p into any normed space (although we shall 
be interested only in the normed space C). By Corollary 2 of Theorem 12.12, 
we know that f is (uniformly) continuous. Alternatively, we can show this 
directly as follows. The linearity of f shows that 

i=i 

and hence 

n n n 

\\f(x)\\ = II ^xJie^W < 2U/HI/(*/)l * max{ll/( ei )l}jU f | . 
(=i (=i (=i 

But 

\ Xi Y ^\ Xi Y> = (]\x\\ p y 

i=l 

and therefore | x ; | < llxll p . If we write K = max{llf(ei)ll}, then this leaves us with 

llf(x) II < nK llxll p 

which shows that f is bounded and hence continuous (Theorem 12.12). 

We now restrict ourselves to the particular case that f: /p 1 -*■ C, and we 
then see that the set of all such f's is just the dual space (If)*. Since f is 
bounded, we can define the norm of f in the usual way by 

llfll = inf{K > 0: |f(x)| < Kllxll p for all xGl}} . 

Now note that for each i = 1, . . . , n the result of f applied to e ; is just some 
scalar y s = f(e s ) G C. Since f(x) = S^iXjffo) = Sl^x^, we see that speci- 
fying each of the yj's will determine f, and conversely, f determines each of 
the yj's. We therefore have an isomorphism y = (y l5 . . . , y n ) -* f of the space 
of all n-tuples y = (y,, . . . , y n ) G C of scalars onto the space (If)* of all linear 
functionals f on /p 1 defined by f(x) = 2?=iXiyi. Because of this isomorphism, 
we want to know what norm to define on the set of all such y's so that the 
mapping y -*■ f is an isometry. 

For any x G If Holder's inequality (see Example 12.7) yields 
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\m\ = 



1=1 



-Ijclplyl, • 

By definition, this implies that llfll < llyll q (since llfll is just the greatest lower 
bound of the set of all bounds for f). We will show that in fact llfll = llyll q . 

Consider the vector x = (x b . . . , x n ) defined by Xj = if y s = 0, and X; = 
|yil q /yi if yi * 0. Then using the fact that 1/p = 1 - 1/q we find 



II x\ p \\y\\ q = 



/ n 

2 

\i=\ 
I n 



\y t \ 





< n \ 


| 









Alp 



\ 



l/q 



2w\ 2w 

I \i=i 



Alq 



i=l I 



2i« 

n 

- 2w • 

On the other hand, we also see that for this x we have 



\m\ = 



n 












(=1 




1=1 



;=i 



Thus, for this particular x, we find that |f(x)| = llxll p llyll q , and hence in general 
we must have llfll = llyllq (since it should now be obvious that nothing smaller 
than K = llyll q can satisfy |f(x)| < K llxll p for all x). 

In summary, defining a norm on the space of all n-tuples y = (y l5 ... , y n ) 
by llyllq, we have constructed a norm preserving isomorphism of l\ onto (If)* 
as desired. 
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In the particular case of p = q = 2, we know that /§ is a Hilbert space, and 
hence (/§)* = /§ . Note also that in general, 

(/£)** = (/§)* = . 

Any normed linear space E for which E** = E is said to be reflexive. Thus we 
have shown that /p 1 is a reflexive Banach space, and hence /§ is a reflexive 
Hilbert space. In fact, it is not difficult to use the Riesz representation theorem 
to show that any Hilbert space is reflexive (see Exercise 12.6.1). / 

Example 12.11 Recall the space /«, defined in Exercise 12.3.5, and let Co 
denote the subspace consisting of all convergent sequences with limit 0. In 
other words, x = {xi, X2, . . . , x n , . . . } has the property that x n -> as n -*■ oo. 
We shall show that Co** = l\* = /«>, and hence Co is not reflexive. 

Let us first show that any bounded linear functional f on Co is expressible 
in the form 

00 

1=1 

where 

cc 

2ui < 00 • 

To see this, let e; = {0, 0, . . . , 1, 0, . . . } be the sequence with a 1 in the ith 
position and O's elsewhere. Now let f(x) be any bounded linear functional on 
Co, and define f; = f(e;). Note that if 

X = {Xj, %2, . . . , X n , 0, . . . } (*) 

then 

x = xiei + x 2 e2 + • • • + x n e n 

and 

(=i 

Observe that if 2f= llfil = 00 , then for every real B it would be possible to find 
an integer N such that 

iui > b . 

(=1 
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But then consider an element x defined by 



X; = 



1 if i*N andfi >0 

-1 if izN and £ <0 

if /</Y and/;. =0 

if i > AT 



Clearly llxll = sup |x;| = 1, and 



\m\ = 



N 



2 fa 



i=i 



N 



= 2\fi\> B = BWxW 



(=i 



which contradicts the assumed boundedness of f. Therefore we must have 

00 

y m < « 



(=i 



It is not hard to see that the set of all elements of the form (*) is dense in 
Co. Indeed, suppose we are given any z = {z\, Z2, . . . , z n , . . . } G co. Then 
given e > 0, we must find an x of the above form with the property that 



Hz - xll = sup; |z; - X;| < 8 



Since any sequence xGco has the property that x n — > as n — * oo, it follows 
that given e > 0, there exists M such that |x n | < e for all n > M. If we choose 
x = {zj, Z2, . . . , zm, 0, . . . }, then clearly we will have Hz - xll < e. 

By Corollary 1 of Theorem 12.12 any bounded linear functional is con- 
tinuous. Together with Theorem 12.7(d), this shows that any bounded linear 
functional on Co is uniquely defined by its values on the dense set of elements 
of the form (*), and hence for every x G Co we must have 

cc 

fix) -J, fa ■ 
(=1 

We now claim that the norm of any such linear functional is given by 

00 

First note that 

00 00 

I/col < juiki ^ iixiijui = 

(=1 i=l 

where 
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Then 



=2 \fi\ 



\m\ 



*2m = 



(=i 



and hence 



11/11 = supi^^: x*0\ < a 



On the other hand, it follows from Theorem B17 (see Appendix B) that given 
£ > 0, there exists N such that 

as < J Ul . 



If we define x again by 



*i = 



then llxll = 1 and 



1 i£i*N and/- >0 

-1 if i<N and/ <0 

if i<N and/ =0 

if i>N 

00 N 



(=i 



so that |f(x)| > a - e. Therefore |f(x)|/llxll > a since e > was arbitrary. But then 
llfll > a, and hence we must have llfll = a as claimed. 

In summary, we have shown that Co* = l\. In Exercise 12.6.2 the reader is 
asked to show that l\* = and hence this shows that cq** = loo. II 



We now proceed to define the adjoint of an operator exactly as we did in 
Section 10.1. Therefore, let H be a Hilbert space and let T G L(H, H) be a 
linear operator on H. If y G H is an arbitrary but fixed vector, then the 
mapping x >-» (y, Tx) is just a linear functional on H. We can thus apply the 
Riesz representation theorem to conclude that there exists a unique vector z G 
H such that (y, Tx) = (z, x). We now define the adjoint of T by T+y = z. 
Since y was arbitrary, we see that the definition of Tt may be stated as 



for all x, y G H. 



(Tty, x) = (y, Tx) 
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To prove Tt is unique, suppose that T' G L(H, H) is defined by (T'y, x) = 
(y, Tx). Then (T'y, x) = (Tty, x) and hence (T'y - Tty, x) = 0. But this must 
hold for all x G H, and therefore T'y - Tty = 0, i.e., T'y = Tty. Since this is 
true for all y, it then follows that T' = Tt. 

Let us show that Tt as we have defined it is really an element of L(H, H). 
In other words, we must show that Tt is both linear and continuous. But this is 
easy since for any x, y, z G H and a G C we have 

(T\x + y),z) = (x + y,Tz) = (x,Tz) + (y,T z ) = (T^x, z) + (T^y, z) 
= (T f x + T f y, z) 

and 

(Tt(ax), y) = (ax, Ty) = a*(x, Ty) = a*(Ttx, y) = ((aTt)x, y) . 
Therefore 

Tt(x + y) = Tt x + Tty 

and 

Tt(ax) = (aTt)x . 

To prove the continuity of Tt, we first show that it is bounded. Using the 
Cauchy- Schwartz inequality we have 

UTtxIl 2 = (Ttx, Ttx) = (x, TTtx) < llxll IITTtxIl < llxll IITII IITtxIl 

which shows that IITtxIl < IITII llxll for all x G H. Since IITII < «, this shows that 
Tt is continuous (Theorem 12.12). We can therefore define the norm of Tt in 
the usual manner to obtain 

HTt|| = sup{HTtxll: llxll = 1} < IITII . 

In fact, we will show in the next theorem that llTt|| = IITII. 

Theorem 12.27 Suppose S and T are operators on a Hilbert space H. Then 
there exists a unique linear operator Tt on H defined by (Tty, x) = (y, Tx) for 
all x, y G H. Moreover, this operator has the following properties: 

(a) (S + T)t = St + Tt. 

(b) (aT)t = a*Tt. 

(c) (ST)t =T tSt. 

(d) Ttt = (Tt)t = T. 

(e) UTt || = IITII. 
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(f) UTtTll = IITII 2 - 

(g) 0t = 0andlt = l. 

(h) If T is invertible, then (T t)-' = (T" 1 ) t . 

Proof The existence and uniqueness of Tt was shown in the above discus- 
sion. Properties (a) - (d) and (g) - (h) follow exactly as in Theorem 10.3. As 
to property (e), we just showed that HTt|| < IITII, and hence together with 
property (d), this also shows that IITII = II (Tt)t|| < ||Tt||. To prove (f), we first 
note that the basic properties of the norm along with property (e) show that 

UTtTll < ||Tt|| IITII = IITII 2 . 

To show that IITII 2 < UTtTll, we observe that the Cauchy- Schwartz inequality 
yields 

HTxIl 2 = (Tx, Tx) = (TtTx, x) < UTtTxIl llxll < UTtTll llxll 2 
which (by definition of IITII) implies IITII < HTtTll 1/2 . I 

While we have defined the adjoint in the most direct manner possible, we 
should point out that there is a more general approach that is similar to our 
discussion of the transpose mapping defined in Theorem 9.7. This alternative 
method shows that a linear operator T defined on a Banach space E leads to a 
"conjugate" operator T* defined on the dual space E*. Furthermore, the map- 
ping T — > T* is a norm preserving isomorphism of L(E, E) into L(E*, E*). 
However, in the case of a Hilbert space, Theorem 12.26 gives us an isomor- 
phism between H and H*, and hence we can consider T* to be an operator on 
H itself, and we therefore define Tt = T*. For the details of this approach, the 
reader is referred to, e.g., the very readable treatment by Simmons (1963). 

Exercises 

1. Let H be a Hilbert space. We define a mapping H -* H** by x F x 
where F x E H** is defined by F x (f) = f(x). We can also consider the com- 
posite mapping H -* H* -* H** defined by x >-» f x >-» Ff x where f x (y) = 
(y, x)andF fx (f) = (f,f x }. 

(a) Show that H** is a Hilbert space with the inner product (Ff, F g ) = 
(g,f>. 

(b) By considering the two mappings defined above, show that H is 
reflexive. 

(c) Show (F x , F y ) = (x, y). 
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2. (a) Show (/! n )* = 1° and (IS)* = k . 

(b) Show li* = lac and = l\. [Hint: Refer to Example 12.11.] 

3. Let V be infinite-dimensional with orthonormal basis {e;}. Define T G 
L(V) by Te; = e;_i. Show T^e; = e; + i. 

12.7 HERMITIAN, NORMAL AND UNITARY OPERATORS 

Let us denote the space L(H, H) of continuous linear maps of H into itself by 
-£(H). In other words, L(H) consists of all operators on H. As any physics 
student knows, the important operators A GX(H) are those for which At = A. 
These operators are called self- adjoint or Hermitian. In fact, we now show 
that the set of all Hermitian operators on H is a closed subspace of L(H). 

Theorem 12.28 The set of all Hermitian operators on a Hilbert space H 
forms a real closed subspace of -£(H). Moreover, this subspace is a real 
Banach space containing the identity transformation on H. 

Proof We showed in Theorem 12.27 that and 1 are Hermitian. If A, B G 
-£(H) are Hermitian operators and a, fSGR, then 

(aA + (3B)t = (aA)t + (|3B)t = a At + (3Bt = aA + |3B 

so that aA + |3B is also Hermitian, and hence the set of Hermitian operators 
forms a real subspace of L(H). If {A n } is a sequence of Hermitian operators 
with the property that A n -*■ A G -£(H), then (using A n t = A n and Theorem 
12.27(e)) 

WA-AH < WA-AJ + \\(A n -A^\\ 
= WA-AJ + IIA^-AII 
= 211 A n -All . 

Since this shows that IIA - A^ll — > as n — > oo, we see that A = At and hence 
A is also Hermitian. Therefore the subspace of all Hermitian operators on H is 
closed (Theorem B 14(a)). 

Finally, since -£(H) is a Banach space (Theorem 12.13), the fact that the 
closed subspace of Hermitian operators forms a real Banach space follows 
from Theorem 12.9. I 
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It should be clear by now that most of the basic properties of Hermitian 
operators on an infinite-dimensional Hilbert space are exactly the same as in 
the finite-dimensional case discussed in Chapter 10. In particular, the proofs 
of Theorems 10.4, 10.9(a) and 10.11(a) all carry over verbatim to the infinite- 
dimensional case. 

Recall from Section 10.3 that an operator N on H is said to be normal if N 
and Nt commute, i.e., if N^N = NNt. It should be obvious that any Hermitian 
operator is necessarily normal, and that ocN is normal for any scalar a. 
However, even if N[ and N 2 are normal, it is not generally true that either 
N[ + N 2 or N,N 2 are normal, and hence the subset of L(R) consisting of 
normal operators is not a sub space. We do however have the following two 
results. 

Theorem 12.29 The set of all normal operators on H is a closed subset of 
-£(H) that is also closed under scalar multiplication. 

Proof All that remains to be shown is that if {N k } is a sequence of normal 
operators that converges to an operator N G L(H), then N is normal. Since 
N k -» N, it follows from Theorem 12.27(a) and (e) that N k t -> Nt. We thus 
have (using the fact that each N k is normal) 

INN*-N*M < WNN^ -N k N k f \\ + \\N k N k 1 - N^NA + lNjN k -N*Nl 
= \\NN f -N k N k f \\ + \\N l ?N k -N t N\\ -» 

which shows that NNt = N^N, and hence N is normal. I 

Theorem 12.30 Let N, and N 2 be normal operators with the property that 
one of them commutes with the adjoint of the other. Then N[ + N 2 and N[N 2 
are normal. 

Proof Suppose that = N 2 ^N t . Taking the adjoint of both sides of this 

equation then shows that N 2 N 1 t = N^N^ In other words, the hypothesis of the 
theorem is equivalent to the statement that both operators commute with the 
adjoint of the other. The rest of the proof is left to the reader (see Exercise 
12.7.1). I 

Probably the most important other type of operator that is often defined on 
a Hilbert space is the unitary operator. We recall that unitary and isometric 
operators were defined in Section 10.2, and we suggest that the reader again 
go through that discussion. Here we will repeat the essential content of that 
earlier treatment in a concise manner. 
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We say that an operator QGX (H) is isometric if IIQxIl = llxll for every x G 
H. Note that we do not require that Q map H onto H, and hence Q 1 need not 
exist (at least if we assume that Q" 1 must be defined on all of H). The defini- 
tion of an isometric operator shows that IIQxIl = if and only if x = 0, and 
hence an isometric operator Q is a one-to-one mapping of H into H (since 
Qx = Qy implies ||Q(x - y)ll = which then implies x = y). 

Theorem 12.31 If Q G£(H), then the following conditions are equivalent: 

(a) QtQ= 1. 

(b) (Qx, Qy) = (x, y) for all x, y G H. 

(c) IIQxIl = llxll. 

Proof Let x, y G H be arbitrary. 

(a) => (b): (Qx, Qy) = (x, QtQy) = (x, ly) = (x, y). 

(b) => (c): IIQxIl 2 = (Qx, Qx) = (x, x) = llxll 2 . 

(c) => (a): IIQxIl 2 = llxll 2 implies (Qx, Qx) = (x, x) or (x, QtQx) = (x, x), and 
hence (x, (QtQ - l)x) = 0. It now follows from Theorem 10.4(b) that QtQ - 
1 = 0, and hence QtQ =1.1 

Isometric operators are sometimes defined by the relationship QtQ = 1, 
and we saw in Section 10.2 that in a finite-dimensional space this implies that 
QQt = 1 also. However, in an infinite-dimensional space, the property QQt = 
1 must be imposed as an additional condition on Q in one way or another. A 
unitary operator U GX(H) is an operator that satisfies TJtU = UlJt = 1. Since 
inverses are unique (if they exist), this implies that an equivalent definition of 
unitary operators is that they map H onto itself and satisfy Ut = U" 1 . 
Alternatively, our next theorem shows that we can define a unitary operator as 
an isometric operator that maps H onto H. 

Theorem 12.32 An operator U G L(H) is unitary if and only if it is a one-to- 
one isometry of H onto itself. In other words, U G X(H) is unitary if and only 
if it is a bijective isometry. 

Proof If U is unitary then it maps H onto itself, and since TJtU = 1, we see 
from Theorem 12.31 that HUxll = llxll. Therefore U is an isometric isomorphism 
of H onto H. 

Conversely, if U is an isomorphism of H onto H then U" 1 exists, and the 
fact that U is isometric shows that UtU = 1 (Theorem 12.31). Multiplying 
from the right by U" 1 shows that Ut = U" 1 , and hence UtU = UUt = 1 so that 
U is unitary. I 



12.7 HERMITIAN, NORMAL AND UNITARY OPERATORS 



679 



One reassuring fact about unitary operators in a Hilbert space is that they 
also obey the analogue of Theorem 10.6. In other words, an operator U on a 
Hilbert space H is unitary if and only if {Uej} is a complete orthonormal set 
whenever {ej is (see Exercise 12.7.2 for a proof). 

There is no all-encompassing treatment of eigenvalues (i.e., like Theorems 
10.21 or 10.26) for Hermitian or unitary operators in an infinite-dimensional 
space even close to that for finite-dimensional spaces. Unfortunately, most of 
the general results that are known are considerably more difficult to treat in 
the infinite-dimensional case. In fact, a proper treatment involves a detailed 
discussion of many subjects which the ambitious reader will have to study on 
his or her own. 



Exercises 

1. Finish the proof of Theorem 12.30. 

2. Prove that U GX(H) is unitary if and only if {Uej} is a complete orthonor- 
mal set if {e ; } is. 
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Metric Spaces 



For those readers not already familiar with the elementary properties of metric 
spaces and the notion of compactness, this appendix presents a sufficiently 
detailed treatment for a reasonable understanding of this subject matter. 
However, for those who have already had some exposure to elementary point 
set topology (or even a solid introduction to real analysis), then the material in 
this appendix should serve as a useful review of some basic concepts. Besides, 
any mathematics or physics student should become thoroughly familiar with 
all of this material. 

Let S be any set. Then a function d: S x S — * R is said to be a metric on S 
if it has the following properties for all x, y, z G S: 



(Ml) d(x,y)>0; 

(M2) d(x, y) = if and only if x = y; 

(M3) d(x, y) = d(y, x); 

(M4) d(x, y) + d(y, z) > d(x, z). 



The real number d(x, y) is called the distance between x and y, and the set S 
together with a metric d is called a metric space (S, d). 

As a simple example, let S = R and let d(x, y) = |x - y| for all x, y £ R. 
From the properties of the absolute value, conditions (Ml) - (M3) should be 
obvious, and (M4) follows by simply noting that 
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|x - z| = |x - y + y - z| < |x - y| + |y - z| . 

For our purposes, we point out that given any normed vector space (V, II II) 
we may treat V as a metric space by defining 

d(x, y) = llx - yll 

for every x, y G V. Using Theorem 2.17, the reader should have no trouble 
showing that this does indeed define a metric space (V, d). In fact, it is easy to 

see that R n forms a metric space relative to the standard inner product and its 
associated norm. 

Given a metric space (X, d) and any real number r > 0, the open ball of 
radius r and center xo is the set Bd(x , r) C X defined by 

B d (x ,r) = {x G X: d(x, x ) < r} . 

Since the metric d is usually understood, we will generally leave off the sub- 
script d and simply write B(x , r). Such a set is frequently referred to as an r- 
ball. We say that a subset U of X is open if, given any point x G U, there 
exists r > and an open ball B(x, r) such that B(x, r) C U. 

Probably the most common example of an open set is the open unit disk 
D[ in R 2 defined by 

D, = {(x, y) G R 2 : x 2 + y 2 < 1} . 

We see that given any point x G D,, we can find an open ball B(x , r) C D, 
by choosing r = 1 - d(x , 0). The set 

D 2 = {(x, y) G R 2 : x 2 + y 2 < 1} 

is not open because there is no open ball centered on any of the boundary 
points x + y = 1 that is contained entirely within D 2 . 

The fundamental characterizations of open sets are contained in the 
following three theorems. 

Theorem Al Let (X, d) be a metric space. Then any open ball is an open 
set. 

Proof Let B(x , r) be an open ball in X and let x be any point in B(x , r). We 
must find a B(x, r') contained in B(x , r). 
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Since d(x, xo) < r, we define r' = r - d(x, xo). Then for any y G B(x, r') we 
have d(y, x) < r', and hence 

d(y, xo) < d(y, x) + d(x, x ) < r' + d(x, x ) = r 

which shows that y G B(x , r). Therefore B(x, r') C B(x , r). I 

Theorem A2 Let (X, d) be a metric space. Then 

(a) Both X and are open sets. 

(b) The intersection of a finite number of open sets is open. 

(c) The union of an arbitrary number of open sets is open. 

Proof (a) X is clearly open since for any x G X and r > we have B(x, r) C 
X. The statement that is open is also automatically satisfied since for any 
x G (there are none) and r > 0, we again have B(x, r) C 0. 

(b) Let {U;}, i G I, be a finite collection of open sets in X. Suppose {U;} 
is empty. Then HUi = X because a point is in the intersection of a collection 
of sets if it belongs to each set in the collection, so if there are no sets in the 
collection, then every point of X satisfies this requirement. Hence RU; = X is 
open by (a). Now assume that {U;} is not empty, and let U = RU;. If U = 
then it is open by (a), so assume that U ^ 0. Suppose x G U so that x G U; for 
every i G I. Therefore there exists B(x, r ; ) C U; for each i, and since there are 
only & finite number of the r; we may let r = min{r;}. It follows that 

B(x, r) C B(x, rO C U, 

for every i, and hence B(x, r) C RU; = U. In other words, we have found an 
open ball centered at each point of U and contained in U, thus proving that U 
is open. 
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(c) Let {U;} be an arbitrary collection of open sets. If {U;} is empty, then 
U = UU; = is open by (a). Now suppose that {U;} is not empty and x G 
UUi . Then xGUj for some i, and hence there exists B(x, r ; ) CUjC UU; so 
that UU; is open. I 

Notice that part (b) of this theorem requires that the collection be finite. To 
see the necessity of this, consider the infinite collection of intervals in R given 
by (-1/n, 1/n) for 1 < n < oo. The intersection of these sets is the point {0} 
which is not open in R. 

In an arbitrary metric space the structure of the open sets can be very 
complicated. However, the most general description of an open set is con- 
tained in the following. 

Theorem A3 A subset U of a metric space (X, d) is open if and only if it is 
the union of open balls. 

Proof Assume U is the union of open balls. By Theorem Al each open ball is 
an open set, and hence U is open by Theorem A2(c). Conversely, let U be an 
open subset of X. For each x E U there exists at least one B(x, r) C U, so that 
U x euB(x, r) C U. On the other hand each x G U is contained in at least 
B(x, r) so that U C U xeu B(x, r). Therefore U = UB(x, r). I 

As a passing remark, note that a set is never open in and of itself. Rather, a 
set is open only with respect to a specific metric space containing it. For 
example, the set of numbers [0, 1) is not open when considered as a subset of 
the real line because any open interval about the point contains points not in 
[0, 1). However, if [0, 1) is considered to be the entire space X, then it is open 
by Theorem A2(a). 

If U is an open subset of a metric space (X, d), then its complement U c = 
X - U is said to be closed. In other words, a set is closed if and only if its 
complement is open. For example, a moments thought should convince you 
that the subset of R 2 defined by {(x, y) G R 2 : x 2 + y 2 < 1} is a closed set. The 
closed ball of radius r centered at x is the set B[x , r] defined in the obvious 
way by 

B[x , r] = {x G X: d(x , x) < r} . 

We leave it to the reader (see Exercise A.3) to prove the closed set ana- 
logue of Theorem A2. The important difference to realize is that the intersec- 
tion of an arbitrary number of closed sets is closed, while only the union of a 
finite number of closed sets is closed. 

If (X, d) is a metric space and Y C X, then Y may be considered a metric 
space in its own right with the same metric d used on X. In other words, if we 
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let d|Y denote the metric d restricted to points in Y, then the space (Y, d|Y) is 
said to be a subspace of the metric space (X, d). 

Theorem A4 Let (X, d) be a metric space and (Y, d|Y) a metric subspace of 
X. Then a subset W C Y is open in Y (i.e., open with respect to the metric 
d|Y) if and only if W = Y D U where U is open in X. 

Proof Let W C Y be open in Y and suppose x G W. Then there exists r > 
such that the set 

B d |Y(x,r) = {yeY:(d|Y)(x,y)<r} 

is a subset of W. But this is clearly the same as the open set 

B d (x, r) = {y G X: d(x, y) < r} 

restricted to only those points y that are in Y. Another way of saying this is 
that 

B d | Y (x,r) = B d (x,r)nY . 

Since W = U xew B d | Y (x, r), it follows that (see Exercise 0.1.1(b)) W = Ufl 
Y where U = U x e w B d (x, r) is open in X (by Theorem A2(c)). 

On the other hand, let W = Y H U where U is open in X, and suppose x G 
W. Then x G U so there exists r > with 

B d (x,r) = {yGX:d(x,y)<r} C U . 

But Y n B d (x, r) is just B d | Y (x, r) = {y G Y: (d|Y)(x, y) < r} C W which 
shows that W is open in Y. I 

Note that all of our discussion on metric spaces also applies to normed 
vector spaces where d(x, y) = llx - yll. Because of this, we can equally well 
discuss open sets in any normed space V. 

Let f: (X, dx) -* (Y, dy) be a mapping. We say that f is continuous at 
xo G X if, given any real number e > 0, there exists a real number 6 > such 
that dx(f(x), f(x )) < 8 for every x G X with dy(x, x ) < 6. Equivalently, f is 
continuous at x if for each B(f(x ), e) there exists B(x , 6) such that 
f(B(x , 6)) C B(f(x ), e). (Note that these open balls are defined with respect 
to two different metrics since they are in different spaces. We do not want to 
clutter the notation by adding subscripts such as dx and dy to B.) In words, 
"if you tell me how close you wish to get to the number f(x ), then I will tell 
you how close x must be to x in order that f(x) be that close." If f is defined 
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on a subset S C X, then f is said to be continuous on S if f is continuous at 
every point of S. 

For example, consider the mapping f: (0, °o) C R — > (0, o°) C R defined by 
f(x) = 1/x. For any x G (0, oo) we have (using the absolute value as our 
metric) 

|f(x)-f(x )| = |l/x-l/x | = |x -x|/|xx | . 

If x is such that |x - x | < 6, then we see that 

|f(x) - f(x )| < 6/|x x | = 6/(x x ) . 

In particular, choosing 6 < x /2, it follows that x > x /2 (since |x - x | < 6 = 
x /2), and hence 6/(x x ) < 26/x 2 . Therefore, given any e > 0, if we pick 6 = 
min{x /2, ex 2 /2} then we will have |f(x) - f(x )| < e. 

Fortunately one can usually tell by inspection (i.e., by drawing a picture if 
necessary) whether or not a particular function is continuous without resorting 
to clever calculations. The general definition is a powerful technique for prov- 
ing theorems about classes of continuous functions satisfying given properties. 
Moreover, there is an intrinsic way to characterize continuous mappings that 
is of the utmost importance. 

Theorem A5 Let f: (X, dx) — * (Y, dy). Then f is continuous if and only if 
f _1 (U) is open in X for all open sets U in Y. 

Proof Suppose f is continuous and U is an open subset of Y. If x G f _1 (U), 
then f(x) G U so there exists B(f(x), e)CU (since U is open). But the continu- 
ity of f then implies that there exists B(x, 6) such that 

f(B(x, 6)) C B(f(x), e) C U . 

Therefore B(x, 6) is an r-ball centered on x and contained in f"'(U), and hence 
f (U) is open. 

Conversely, assume that f~'(U) is open whenever U is, and let x G X be 
arbitrary. Then the open ball B(f(x), e) is an open set, so its inverse image is 
an open set containing x. Therefore there exists an open ball B(x, 6) contained 
in this inverse image, and it clearly has the property that f(B(x, 6)) C 
B(f(x), e), hence proving that f is continuous. I 

Corollary If f: (X, dx) -* (Y, dy), then f is continuous if and only if f"'(F) is 
closed in X whenever F is closed in Y. 
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Proof It was shown in Exercise 0.2.1 that if A C Y, then f'(A c ) = r'(A) c . 
Therefore if F C Y is closed, then F = U c for some open set U C Y and so by 
Theorem A5, r'(F) = f'(U c ) = r'(U) c must be closed if and only if f is con- 
tinuous. I 

Note that if f: X — > Y is continuous and U C Y is open, then f _1 (U) is 
open, but if A C X is open, then it is not necessarily true that f(A) is open. As 
a simple example, consider the function f: R — * R 2 defined by f(x) = (x, x 2 ). It 
should be clear that the open ball U C R 2 shown below is an open set whose 
inverse image is an open interval on R U {0} (since some points of U are not 
the image under f of any point in R), but that the image under f of an open 

interval is part of the parabola y = x 2 which is not open as a subset of R 2 . 




Now suppose that (X, d) is a metric space, and let {Uj} be a collection of 
open subsets of X such that UUj = X. Such a collection of subsets is called an 
open cover of X. A subcollection {V,} of the collection {Uj} is said to be an 
open subcover of X if UVj = X. A space (X, d) is said to be compact if every 
open cover has a finite subcover. Similarly, given a subset A C X, a collection 
{U;} of open subsets of X with the property that A C UU; is said to be an 
open cover of A. Equivalently, the collection {U;} of open subsets of X is an 
open cover of A in X if the collection {U; D A} is an open cover of the subset 
A in the metric d|A (i.e., in the subspace A). We then say that A is compact if 
every open cover of A has a finite subcover, or equivalently, A is compact if 
the subspace A is compact. While this is not a particularly easy concept to 
thoroughly understand and appreciate without detailed study, its importance to 
us is based on the following two examples. 

Example Al Consider the subset A = (0, 1) of the real line R. We define the 
collection {U b U 2 , ... } of open sets by 
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U n = (l/2 n+1 , 1 - l/2 n+1 ) . 

Thus Uj = (1/4, 3/4), U 2 = (1/8, 7/8) etc. The collection {U n } clearly covers A 
since for any x G (0, 1) we can always find some U n such that x G U n . 
However, A is not compact since given any finite number of the U n there 
exists e > (so that e G (0, 1)) which is not in any of the U n . / 

Example A2 Let us show that the subspace [0, 1] of the real line is compact. 
This is sometimes called the Heine- Borel theorem, although we shall prove a 
more general version below. 

First note that the points and 1 which are included in the subspace [0, 1] 
are not in the set (0, 1) discussed in the previous example. However, if we 
have positive real numbers a and b with a < b < 1, then the collection {U n } 
defined above together with the sets [0, a) and (b, 1] does indeed form an open 
cover for [0, 1] (the sets [0, a) and (b, 1] are open by Theorem A4). It should 
be clear that given the sets [0, a) and (b, 1] we can now choose a finite cover 
of [0, 1] by including these sets along with a finite number of the U n . To 
prove that [0, 1] is compact however, we must show that any open cover has a 
finite subcover. 

Somewhat more generally, let {O n } be any open cover of the interval 
[a, b] in R. Define 

A = {x G [a, b]: [a, x] is covered by a finite number of the O n } . 

We see that A ^ since clearly a G A, and furthermore A is bounded above 
by b. Therefore (by the Archimedean axiom) A must have a least upper bound 
m = sup A < b. If A is to be compact, then we must have b G A. We will show 
that this is true by first proving that m G A, and then that m = b. 

Since {O n } covers [a, b] and m G [a, b], it follows that m G O m for some 
O m G {O n }. Now, O m is an open subset of [a, b], and hence there are points in 
O m that are less than m, and points in O m that are greater than m. 



1 ( III ) \— 

a x m y b 

Since m = sup A, there is an x < m with x G O m such that the interval [a, x] is 
covered by a finite number of the O n , while [x, m] is covered by the single set 
O m . Therefore [a, m] is covered by a finite number of open sets so that m G 
A. 
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Now suppose that m< b. Then there is a point y with m< y < b such that 
[m, y] C O m . But we just showed that m G A, so the interval [a, m] is covered 
by finitely many O n while [m, y] is covered by O m . Therefore y G A which 
contradicts the definition of m, and hence we must have m = b. / 

An important property of metric spaces is the following. Given two dis- 
tinct points x, y G X, there exist disjoint open sets U and V in X such that x G 
U and y G V. That this does indeed hold for metric spaces is easy to prove by 
considering open balls of radius d(x, y)/2 centered on each of the points x and 
y. This property is called the Hausdorff property. We sometimes refer to a 
metric space as a "Hausdorff space" if we wish to emphasize this property. 

The following theorems describe some of the most fundamental properties 
of compact spaces. 

Theorem A6 Any closed subset of a compact space is compact. 

Proof Let F C X be a closed subset of a compact space X. If {Uj} is any 
open cover of F, then (UUi) U F c is an open cover of X. Since X is compact, 

we may select a finite subcover by choosing F c along with a finite number of 
the Uj. But then F is covered by this finite subcollection of the Uj, and hence F 
is compact. I 

Theorem A7 Any compact subset of a metric space is closed. 

Proof Let F be a compact subset of a metric space X. We will show that F c is 

open. Fix any x G F c and suppose y G F. Since X is Hausdorff, there exist 
open sets U y and V y such that x G U y , y G V y and U, fl V, = 0. As the point 
y varies over F, we see that {V y : y G F} is an open cover for F. Since F is 
compact, a finite number, say V yi , . . . , V yn , will cover F. Corresponding to 
each V yi there is a U yi , and we let U = I^Uy; and V = UiV yi . By construction 
x G U, F C V and U (1 V = 0. But then U is an open set containing x such 
that U fl F = 0, and hence F c is open. I 

Theorem A8 Let (X, dx) be a compact space and let f be a continuous func- 
tion from X onto a space (Y, dy). Then Y is compact. 

Proof Let {Uj} be any open cover of Y. Since f is continuous, each f~'(Uj) is 
open in X, and hence {f _1 (Uj)} is an open cover for X. But X is compact, so 
that a finite number of the f"'(Ui), say {f'CUi,), . . . , f"'(U in )} cover X. 
Therefore {Ui p . . . , Ui n } form a finite subcover for Y, and hence Y is 
compact. I 
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Theorem A9 Let {K s } be a collection of compact subsets of a metric space 
X, such that the intersection of every finite subcollection of {K s } is nonempty. 
Then HKj is nonempty. 

Proof Fix any K, G {K ; } and assume that Ki D (fl^K;) = 0. We will show 
that this leads to a contradiction. First note that by our assumption we have 
(n^jKj) C Ki c , and hence from Example 0.1 and Theorem 0.1 we see that 

Thus {K; c }, i * 1, is an open cover of Ki. But Ki is compact so that a finite 
number of these sets, say Ki, c , . . . , K; n c cover Ki. Then 

k x c (u^ =1 K la c ) = (n^ =1 K ia ) c 

which implies 

However, this contradicts the hypothesis of the theorem. I 

Corollary If {K n } is a sequence of nonempty compact sets such that 
K n D K n+1 , then nK n * 0. 

Proof This is an obvious special case of Theorem A9. I 

As a particular application of this corollary we see that if {I n } is a 
nonempty sequence of intervals [a n , b n ] C R such that I n D I n+ i ; then DI n ^ 
0. While this result is based on the fact that each I n is compact (Example A2), 
we may also prove this directly as follows. If I n = [a n , b n ], we let S = {a n }. 
Then S ^ and is bounded above by b x . By the Archimedean axiom, we let 

x = sup S. For any m, n G Z + we have a n < a m+n < b m+n < b m so that x < b m 
for all m. Since a m < x for all m, we must have x G [a m , b m ] = I m for each 

m = 1, 2, ... so that DI n ^ 0. We now show that this result holds in R n as 
well. 

Suppose a, b G R n where a 1 < b 1 for each i = 1, . . . , n. By an n-cell we 
mean the set of all points x£R n such that a 1 < x 1 < b 1 for every i. In other 
words, an n-cell is just an n-dimensional rectangle. 

Theorem A10 Let {I k } be a sequence of n-cells such that I k D I^+i. Then 
m k * 0. 
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Proof For each k = 1, 2, . . . the n-cell I k consists of all points xEK" with 
the property that a k ! < x 1 < b k l for every 1 < i < n, so we let I k ' = [a k \ tV]. 
Now, for each i = 1, . . . , n the sequence {Ik 1 } satisfies the hypotheses of the 
corollary to Theorem A9. Hence for each i = 1, . . . , n there exists z 1 G 

[a k \ bk 1 ] for every k = 1, 2, . . . . If we define z = (z 1 , . . . , z n ) G R n , we see 
that z G I k for every k = 1 , 2, . . . . I 

Theorem All Every n-cell is compact. 

Proof Let I be an n-cell as defined above, and set 6 = [S^ifa - aj) 2 ] 172 . 
Then if x, y G I we have llx - yll < 6 (see Example 2.9). Let {U;} be any open 
cover of I and assume that it contains no finite subcover. We will show that 
this leads to a contradiction. 

Let c J = (a J + b J )/2 for each j = 1, . . . , n. Then we have 2 n n-cells Q ; 
defined by the intervals [a 1 , c J ] and [c J , b J ] such that UQj = I. Since I has no 
finite subcover, at least one of the Q s , which we call I,, can not be covered by 

any finite number of the U;. Next we subdivide L into another 2 n n-cells and 
continue in the same manner. We thus obtain a sequence {I a } of n-cells with 
the following properties: 

(a) I D IjD I 2 D - ; 

(b) I a is not covered by any finite subcollection of the U k ; 

(c) x, y G I a implies llx - yll < 2" a 6. 

By (a) and Theorem A10, there exists z G DI a , and since {U;} covers I, 
we must have z G Uk for some k. Now, U k is an open set in the metric space 

R n , so there exists 8 > such that llz - yll < 8 implies that y G U k . If we 
choose a sufficiently large that 2" a 6 < 8 (that this can be done follows from 
Theorem 0.3), then (c) implies that I a C U k which contradicts (b). I 

We are now in a position to prove the generalized Heine-Borel theorem. 
Before doing so however, we first prove a simple result which is sometimes 
taken as the definition of a compact set. By way of terminology, any open set 
U containing a point x is said to be a neighborhood of x, and the set U - {x} 
is called a deleted neighborhood of x. We say that a point x G (X, d) is an 
accumulation point of A C X if every deleted neighborhood of x intersects 
A. 

Theorem A12 Any infinite subset A of a compact set K has a point of accu- 
mulation in K. 
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Proof Suppose every point x G K is not an accumulation point of A. Then 
there exists a neighborhood U x of x such that U x contains at most a single 
point of A, namely x itself if x G A. Then clearly no finite subcollection of 
{Ux} covers A C K so that K can not possibly be compact. I 

Theorem A13 A subset A of a metric space (X, d) is closed if and only if A 
contains all of its accumulation points. 

Proof First suppose that A is closed. Let x G X be an accumulation point of 
A and assume that x ^ A. Then x G A c which is an open set containing x that 
does not intersect A, and hence contradicts the fact that x is an accumulation 
point of A. Therefore x must be an element of A. 

Conversely, suppose A contains all of its accumulation points. We show 

that A c is open. If x G A c and hence is not an accumulation point of A, then 
there exists an open set U containing x such that A (1 U = 0. But then x G 

U C A c which implies that A c is open. I 

We say that a subset A C R n is bounded if it can be enclosed in some n- 
cell. The equivalence of (a) and (b) in the next theorem is called the 
(generalized) Heine- Borel theorem, while the equivalence of (a) and (c) is a 
general version of the Bolzano- Weierstrass theorem. 

Theorem A14 Let A be a subset of R n . Then the following three properties 
are equivalent: 

(a) A is closed and bounded . 

(b) A is compact. 

(c) Every infinite subset of A has a point of accumulation in A. 

Proof (a) => (b): If (a) holds, then A can be enclosed by some n-cell which is 
compact by Theorem All. But then A is compact by Theorem A6. 

(b) => (c): This follows from Theorem A 12. 

(c) => (a): We assume that every infinite subset of A has an accumulation 
point in A. Let us first show that A must be bounded. If A is not bounded, 
then for each positive integer k = 1, 2, ... we can find an x^ G A such that 
llxfcll > k. Then the set {x^} is clearly infinite but contains no point of accumu- 
lation in R n , so it certainly contains none in A. Hence A must be bounded. 

We now show that A must be closed. Again assume the contrary. Then 
there exists xo G R n which is an accumulation point of A but which does not 
belong to A (Theorem A13). This means that for each k = 1, 2, . . . there exists 
Xk G A such that Ikk - xqII < 1/k. The set S = {x^} is then an infinite subset of 
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A with xo as an accumulation point. Since xo ^ A, we will be finished if we 
can show that S has no accumulation point in A (because the assumption that 
A is not closed then leads to a contradiction with the property described in 
(c)). 

First note that if a , b G R n , then Example 2. 1 1 shows us that 
lla + bll = lla - (-b)ll > lla II - llbll . 
Using this result, if y is any point of R n other than xo we have 

\\x k -y\\ = \\x k -x +x -y\\ 

> \\x -yll - \\x k -x \\ 

> \\x -y\\ - Ilk . 

No matter how large (or small) llxo - yll is, we can always find a k G Z + 
such that 1/k < (l/2)llxo - yll for every k > k (this is just Theorem 0.3). Hence 

llx k - yll > (l/2)llxo - yll 

for every k > k . This shows that y can not possibly be an accumulation point 
°f { x k} = S (because the open ball of radius (l/2)llxo - yll centered at y can 
contain at most a finite number of elements of S). I 

We remark that the implication "(a) implies (b)" in this theorem is not true 
in an arbitrary metric space (see Exercise A. 5). 

Let f be a mapping from a set A into R n . Then f is said to be bounded if 
there exists a real number M such that llf(x)ll < M for all x G A. If f is a con- 
tinuous mapping from a compact space X into R n , then f(X) is compact 
(Theorem A8) and hence closed and bounded (Theorem A14). Thus we see 

that any continuous function from a compact set into R n is bounded. On the 
other hand, note that the function f: R — » R defined by f(x) = 1/x is not 
bounded on the interval (0, 1). We also see that the function g: R -* R defined 
by g(x) = x for x G [0, 1) never attains a maximum value, although it gets 
arbitrarily close to 1. Note that both f and g are defined on non-compact sets. 

We now show that a continuous function defined on a compact space takes 
on its maximum and minimum values at some point of the space. 

Theorem A15 Let f be a continuous real- valued function defined on a com- 
pact space X, and let M = sup xe x f(x) and m = inf x( =x f(x). Then there exist 
points p, q G X such that f(p) = M and f(q) = m. 
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Proof The above discussion showed that f(X) is a closed and bounded subset 
of R. Hence by the Archimedean axiom, f(X) must have a sup and an inf. Let 
M = sup f(x). This means that given e > 0, there exists x G X such that 

M - £ < f(x) < M 

(or else M would not be the least upper bound of f(X)). This just says that any 
open ball centered on M intersects f(X), and hence M is an accumulation point 
of f(X). But f(X) is closed so that Theorem A13 tells us that M G f(X). In 
other words, there exists p G X such that M = f(p). The proof for the minimum 
is identical. I 

As an application of these ideas, we now prove the Fundamental Theorem 
of Algebra. 

Theorem A16 (Fundamental Theorem of Algebra) The complex number 
field C is algebraically closed. 

Proof Consider the non-constant polynomial 

f(z) = a + a,z + ■ ■ ■ + a n z n G C[z] 

where a n ^ 0. Recall that we view C as the set K x R = R 2 , and let R be any 
(finite) real number. Then the absolute value function |f|: C -* R that takes 
any z G C to the real number |f(z)| is continuous on the closed ball B[0, R] of 
radius R centered at the origin. But B[0, R] is compact (Theorem A 14) so that 
|f(z)| takes its minimum value at some point on the ball (Theorem A15). On 
the other hand, if we write f(z) in the form 

f(z) = a n z n (a /a n z n + a^z 11 " 1 + • • • + a n _i/a n z + 1) 

we see that |f(z)| becomes arbitrarily large as |z| becomes large. To be precise, 
given any real C > there exists R > such that |z| > R implies |f(z)| > C. 

We now combine these two facts as follows. Let z x be arbitrary, and define 
C = |f(z,)|. Then there exists R > such that |f(z)| > |f(z,)| for all z G C such 
that |z - z,| > R (i.e., for all z outside B[z,, R ]). Since B[z b R ] is compact, 
there exists a point z G B[z b R ] such that |f(z )| < |f(z)| for all z G B[z b R ]. 
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B[z„R ] 

In particular, |f(z )| < |f(z,)| and hence we see that |f(z )| < |f(z)| for all z G C. 
In other words, z is an absolute minimum of |f|. We claim that f(z ) = 0. 

To show that f(z ) = 0, we assume that f(z ) ^ and arrive at a contradic- 
tion. By a suitable choice of constants c s , we may write f in the form 

f(z) = c + d(z - z ) + ■ ■ ■ + c n (z - z ) n . 

If f(z ) ^ then c = f(z ) ^ 0. By assumption, deg f > 1, so we let m be the 
smallest integer greater than such that c m ^ 0. Defining the new variable w = 
z - z , we may define the polynomial function g by 

f(z) = g(w) = c + c m w m + w m+1 h(w) 

for some polynomial h. 

Now let W[ be a complex number such that w t m = -c /c m and consider all 
values of w = Xw t for real X with < X < 1. Then 

c m w m = c m X m w, m = -c X m 

and hence 

f(z) = g(A Wl ) = c - A m c + A m+1 Wl m+l h( A Wl ) 

= c [1 - A ffl + A m+1 w 1 m+1 c - 1 /z(Aw 1 )] . 

But X G [0, 1] which is compact, and hence |w 1 m+1 c "'h(Xw 1 )| is a continuous 
function defined on a compact set. Then the image of this function is a 
compact subset of R (Theorem A8) and so is closed and bounded (Theorem 
A 14). This means there exists a number B > such that 

Iw^+VKXw,)! < B 
for all X G [0, 1], and therefore (since < X < 1 implies that < X m < 1) 
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IrfAwOl = |c ||l-A m + A m+ V + V 1 ^ 1 )l 

^ |c |{|l-A m | + A w+1 kr + V^(M)l} 
< |c |(l-A m +A w+1 fl) . 

Now recall that |c | = |f(z )l =s |f(z)| for all z G C. If we can show that 

< 1 - X m + X m+1 B < 1 

for sufficiently small X with < X < 1, then we will have shown that |f(z)| = 
|g(X,Wi)| < |c |, a contradiction. But it is obvious that X can be chosen so that 
< 1 - X m + X m+1 B. And to require that 1 - X m + X m+1 B < 1 is the same as 
requiring that XB < 1 which can certainly be satisfied for small enough X. I 

Exercises 

1. Show the absolute value function is continuous on R. 

2. Show that the norm on a vector space V defines a metric on V. 

3. Let (X, d) be a metric space. Prove: 

(a) Both X and are closed sets. 

(b) The intersection of an arbitrary number of closed sets is closed. 

(c) The union of a finite number of closed sets is closed. 

4. Let A be the subset of [0, 1] consisting of all x G [0, 1] whose decimal 
expansion contains only the digits 4 and 7. Explain whether or not A is 
countable, dense in [0, 1], or compact. 

5. Show that {x: llxl^ < 1} is closed and bounded but not compact in the 
space h (see Example 12.8). 

6. A metric space is said to be separable if it contains a countable dense 
subset. Prove that R n is separable. [Hint: Consider the set of all points in 
R n with rational coordinates.] 
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In this appendix we briefly go through all of the theory necessary for an 
understanding of Section 10.6 and Chapter 12. Furthermore, as we mentioned 
at the beginning of Appendix A, rather than being simply an overview, we 
want the reader to understand this material even if it has not been studied 
before. We do assume however, that the reader has studied Appendix A, and 
part of this appendix is a direct extension of that material. 

A sequence {x n } = {x,, x 2 , . . . } in a set S is any function from the set Z + 
of positive integers into S. If (X, d) is a metric space, then a sequence {x n } in 

X is said to converge to the limit x if for each B(x, e) there exists NEZ + 
such that x n G B(x, e) for every n > N. In other words, given e > 0, there exists 
a positive integer N such that n > N implies d(x n , x) < e. This is usually 
written as lim x n = x or x n -*■ x. If a sequence {x n } does not converge, then it 
is said to diverge. Furthermore, if for every real number M there exists an 
integer N such that n > N implies x n > M, then we write x n — > +oo. Similarly, 
if for every real number M there exists an integer N such that n > N implies 
x n < M, then we write x n — * -oo. 

(We remark that the small, always positive number e will be used exten- 
sively in many proofs, and it is important to realize that for proofs of 
convergence there is no real difference between the number e and certain 
simple functions of s such as 2s. For example, suppose we can show that 
given e, there exists N such that n > N implies d(x n , x) < 2s. We claim this 
also proves that x n -* x. Indeed, let s' = e/2. Then, by assumption, there exists 
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N' such that n > N' implies d(x n , x) < 2s' = e which shows that x is the limit of 
the sequence. It should be clear that the statement "given e > 0" is equivalent 
to saying "given Cs > 0" for any finite C > 0.) 

The set of all points x n for n = 1 , 2, . . . is called the range of the sequence 
{x n }. This may be either a finite or an infinite set of points. A set A C (X, d) 
is said to be bounded if there exists a real number M and a point xo G X such 
that d(x, xo) < M for all x G A. (The point xo is almost always taken to be the 
origin of any given coordinate system in X.) The sequence {x n } is said to be 
bounded if its range is a bounded set. It is easy to show that any convergent 
sequence is bounded. Indeed, if x n — * x then, given 1, there exists N such that 
n > N implies d(x n , x) < 1. This shows that {x n : n > N} is bounded. To show 
that {x n : n=l,...,N-l}is bounded, let 

r = max{l, d(x b x), . . . , d(x N -i, x)} . 

Since x n G X, it must be true that each d(x n , x) is finite, and hence d(x n , x) < r 
for each n = 1, . . . , N - 1. 

We now prove several elementary properties of sequences, starting with 
the uniqueness of the limit. 

Theorem Bl If {x n } is a sequence in a metric space (X, d) such that x n -* x 
and x n -*■ y, then x = y. 

Proof Given e > 0, there exists N such that n > N implies d(x n , x) < e and 
d(x n , y) < e. But then d(x, y) < d(x n , x) + d(x n , y) < 2e. Since this holds for all 
8 > 0, we must have x = y (see Appendix A, definition (M2)). I 

Theorem B2 Let s n -*■ s and t n -*■ t be convergent sequences of complex 
numbers. Then 

(a) lim (s n + t n ) = s + t. 

(b) lim cs n = cs and lim (c + s n ) = c + s for any c G C. 

(c) lim s n t n = st. 

(d) lim l/s n = 1/s if s ^ and s n ^ for all n. 

Proof (a) Given e > 0, there exists Nj and N2 such that n > Nj implies that 
|s n - s| < e/2 and n > N 2 implies that |t n - t| < e/2. Let N = max{Ni, N 2 }. Then 
n > N implies 

|(s n - S) + (t n - t)| < |s n - S| + |t n - t| < £ . 

(b) This is Exercise B.l. 
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(c) Note the algebraic identity 

s n t n - st = (s n - s)(t„ - t) + s(t n - t) + t(s n - s) . 

By parts (a) and (b) we see that lim s(t n - t) = = lim t(s n - s). Now, given 
8 > 0, there exists N] and N2 such that n > N] implies 

|s n - s| < Ve 

and n > N2 implies 

|t n - 1| < Ve . 
If N = max{Ni, N 2 }, then n > N implies that 

|(s„ - s)(t n - 1)| < e 

and hence lim (s n t n - st) = lim (s n - s)(t n - t) = 0. 

(d) At the end of Section 0.4 we showed that |a| - |b| < |a + b| for any a, 
b£ R. Reviewing the proof shows that this applies to complex numbers as 
well. Letting b -* -b then shows that |a| - |b| < |a - b|. 

Given |s|/2, there exists m such that n > m implies |s n - s| < |s|/2 (this 
follows from the fact that s n -*■ s). But |s| - |s n | < |s n - s| < |s|/2 implies |s n | > 
|s|/2 . 

Alternatively, given £ > 0, there exists N (which we can always choose 
greater than m) such that n > N implies |s n - s| < |s| 2 e/2. Combining these 
results, we see that for all n > N we have 

1 _ 1 

s n s 

Intuitively, we expect that a sequence {x k } of points in R n converges to a 
point in R n if and only if each of the n coordinates converges on its own. That 
this is indeed true is shown in our next theorem. 

Theorem B3 Suppose x^ = (x^ 1 , . . . , x^" )GR n . Then {x^} converges to x 
= (x 1 , . . . , x n ) G R n if and only if x^ 1 -» x 1 for each i = 1, . . . , n. 

Proof First note that for any j = 1, . . . , n we have 
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which implies |x J - y J | < |x - y |. Now assume that x^ -*■ x. Then given e > 0, 
there exists N such that k > N implies Ix^- 1 - y J | < Ix^ - x| < e. This shows that 
Xk — * x implies — > x J . 

Conversely, assume x^- 1 -* x J for every j = 1, . . . , n. Then given e > 0, 
there exists N such that k > N implies Ix^- 1 - x J | < s/Vn . Hence k > N implies 

\ x k ~ x 
so that Xk -* x. I 

A sequence {x^} in a metric space (X, d) is said to be a Cauchy sequence 
if given e > 0, there exists N such that n, m > N implies d(x n , x m ) < e. It is 
easy to see that every convergent sequence is in fact a Cauchy sequence. 
Indeed, simply note that if x^ — > x, then given e > 0, there exists N such that 
n > N implies d(x n , x) < e/2. Hence if n, m > N we have 

d(x n , x m ) < d(x n , x) + d(x m , x) < e/2 + e/2 = e . 

However, it is not true that a Cauchy sequence need converge. For exam- 
ple, suppose X = (0, 1]CR and let {xjj- = {1/k} for k = 1, 2, . . . . This is a 
Cauchy sequence that wants to converge to the point (choose N = 1/e so that 
|l/n - l/m| < |l/n| + |l/m| < 2e for all m, n > N). But (0, 1] so that the 
limit of the sequence is not in the space. This example shows that convergence 
is not an intrinsic property of sequences, but rather depends on the space in 
which the sequence lies. A metric space in which every Cauchy sequence 
converges is said to be complete (see Appendix A). 

We have shown that any convergent sequence is necessarily a Cauchy 
sequence, but that the converse is not true in general. However, in the case of 
R n , it is in fact true that every Cauchy sequence does indeed converge, i.e., R n 
is a complete metric space. This is easy to prove using the fact that any n-cell 
in R n is compact (Theorem All), and we outline the proof in Exercise B.10. 
However, it is worth proving that R n is complete without using this result. We 
begin by proving several other facts dealing with the real number line R. By 
way of terminology, a sequence {x n } of real numbers with the property that 
x n < x n+ i is said to be increasing. Similarly, if x n > x n+ i then the sequence is 
said to be decreasing. We will sometimes use the term monotonic to refer to 
a sequence that is either increasing or decreasing. 
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Theorem B4 Let {x^} be an increasing sequence (i.e., x^ < x^+i) of real 
numbers that is bounded above. Then the least upper bound b of the set {x^} 
is the limit of the sequence. 

Proof It should be remarked that the existence of the least upper bound is 
guaranteed by the Archimedean axiom. Given e > 0, the number b - e/2 is not 
an upper bound for {x^} since b is by definition the least upper bound. 
Therefore there exists N such that b - e/2 < xn ^ b (for otherwise b - e/2 
would be the least upper bound). Since {xjj- is increasing, we have 

b - e/2 < xn ^ x n < b 

for every n > N. Rearranging, this is just b - x n < e/2 < e which is the same as 
|x n -b|<e. I 

Since the Archimedean axiom also refers to the greatest lower bound of a 
set of real numbers, it is clear that Theorem B4 may be applied equally well to 
the greatest lower bound of a decreasing sequence. 

Let (X, d) be a metric space, and suppose A is a subset of X. Recall from 
Appendix A that a point x G X is said to be an accumulation point of A if 
every deleted neighborhood of x contains a point of A. The analogous term for 
sequences is the following. A number x is said to be a cluster point (or limit 
point) of a sequence {x n } if given e > 0, there exist infinitely many integers n 
such that |x n - x| < e. Equivalently, x is a cluster point if given e > and given 
N, there exists some n > N such that |x n - x| < e. Note that this does not say 
that there are infinitely many distinct x n such that |x n - x| < e. In fact, all the 
x n could be identical. It is important to distinguish between the indices n and 
the actual elements x n of the sequence. It is also important to realize that a 
limit point of a sequence is not the same as the limit of a sequence (why?). 
Note also that a sequence in X may be considered to be a subset of X, and in 
this context we may also refer to the accumulation points of a sequence. 

Our next result is known as the Bolzano- Weierstrass Theorem. 

Theorem B5 (Bolzano-Weierstrass) Let {x^} be a sequence of real num- 
bers, and let a, b G R be such that a < x^ < b for all positive integers k. Then 
there exists a cluster point c of the sequence with a < c < b. 

Proof For each n, the sequence {x n , x n+ i, . . . } is bounded below (by a), and 
hence has a greatest lower bound (whose existence is guaranteed by the 
Archimedean axiom) which we denote by c n . Then {c n } forms an increasing 
sequence c n < c n+ i < • • • which is bounded above by b. Theorem B4 now 
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shows that the sequence {c n } has a least upper bound c (with a < c < b) which 
is in fact the limit of the sequence {c n }. We must show that c is a cluster point 
of the sequence {x]J. 

To say that c is the limit of the sequence {c n } means that given £ > and 
given any N, there exists some m > N such that 

|c m - c| < e/2 . 

By definition, c m is the greatest lower bound of the set {x m , x m+ i, . . . } which 
means there exists k > m such that c m < x k < c m + e/2 or 

|xk-c m | < e/2 . 

Therefore k > m > N and 

|x k - c| = |x k - c m + c m - c| < |x k - c m | + |c m - c| < 8 

which shows that c is a cluster point of the sequence {x k }. I 

Note that Theorem B5 also follows from Theorem A 12. 

Theorem B6 If {x n } is a Cauchy sequence of numbers, then it is bounded. 

Proof By definition of a Cauchy sequence, given e = 1 there exists N such 
that n > N implies |x n - xnI < 1- Hence |x n | - |xnI ^ |x n - xnI < 1 implies |x n | < 
|xnI + 1 for every n > N. Define B = max{|xil, . . . , |xnI, IxnI + 1}- Then B is 
clearly a bound for {x n }. I 

Theorem B7 Any Cauchy sequence {x n } of numbers converges. 

Proof From Theorem B6 the sequence {x n } has a bound B, and hence we 
have -B < x n < B for all n. Hence by Theorem B5, the sequence {x n } has a 
cluster point c. We claim that c is the limit of the sequence. Since the 
sequence is Cauchy, given e > there exists N such that m, n > N implies 
|x m - x n | < e/2. Using this e, we see that because c is a cluster point, there 
exists m > N such that |x m - c| < e/2. Combining these last two results shows 
that for all n > N 

|x n - c| < |x n - x m | + |x m - c| < e . I 

We are now in a position to prove our principal assertion. 
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Theorem B8 R n is a complete metric space. In other words, every Cauchy 
sequence in R n converges to a point in R n . 

Proof Let {x]J be a Cauchy sequence in R n . Then |x m J - x n J | < |x m - x n | (see 
the proof of Theorem B3) so that {x^- 1 } is also a Cauchy sequence in R for 
each j = 1, . . . , n. Hence by Theorem B7 each of the sequences {x^- 1 } also 
converges in R. Therefore (by Theorem B3) the sequence {xjj- must converge 

inR n . I 

We have seen that any convergent sequence is a Cauchy sequence and 
hence bounded. However, the converse is not generally true. (For example, 
the sequence {1, 2, 1, 2, . . . } is clearly bounded but does not converge to 
either 1 or 2.) There is, however, a special case in which the converse is true 
that will be of use to us. 

Theorem B9 A monotonic sequence {x n } converges if and only if it is 
bounded. 

Proof We consider increasing sequences. The proof for decreasing sequences 
is similar. It was shown above that any convergent sequence is bounded, so 
we need only consider the converse. But this was proved in Theorem B4. I 

Finally, accumulation points are useful in determining whether or not a set 
is closed. The principal result relating these concepts is the following. 

Theorem BIO A subset A of a metric space (X, d) is closed if and only if A 
contains all of its accumulation points. 

Proof This is also Theorem A 13. I 

Before continuing, we must make a digression to discuss some more basic 
properties of metric spaces. If the reader already knows that a point x in a 
subset A of a metric space X is in the closure of A if and only if every neigh- 
borhood of x intersects A, then he/she may skip to Theorem B16 below. 

Let (X, d) be a metric space, and suppose A C X. We define 

(a) The closure of A, denoted by CI A, to be the intersection of all closed 
supersets of A; 

(b) The interior of A, denoted by Int A (or A ), to be the union of all 
open subsets of A; 
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(c) The boundary of A, denoted by Bd A, to be the set of all x G X such 
that every open set containing x contains both points of A and points 
of A C = X- A; 

(d) The exterior of A, denoted by Ext A, to be (CI A) c = X - CI A; 

(e) The derived set of A, denoted by A', to be the set of all accumulation 
points of A. 

Example Bl Let X = R 2 with the Pythagorean metric. Let A be the open unit 
ball defined by A = {(x, y): x 2 + y 2 < 1}. Then the following sets should be 
intuitively clear to the reader: 

ClA = {(x, y):x 2 + y 2 <l}; 
Int A = A; 

BdA = {(x, y):x 2 + y 2 =l}; 
ExtA = {(x, y):x 2 + y 2 >l}; 
A' = C1A / 

Theorem Bll Let (X, d) be a metric space and suppose A, B C X. Then 

(a) AC CI A. 

(b) C1(C1A) = C1A 

(c) C1(A U B) = (CI A) U (CI B). 

(d) CI = 0. 

(e) A is closed if and only if A = CI A. 

(f) ci(A n B) c (ci A) n (ci B). 

Proof Parts (a), (b), (d) and (e) are essentially obvious from the definition of 
CI A, the fact that the intersection of an arbitrary number of closed sets is 
closed (see Exercise A.3), the fact that the empty set is a subset of every set, 
and the fact that if A is closed, then A is one of the closed sets which contains 
A. 

(c) First note that if S C T, then any closed superset of T is also a closed 
superset of S, and therefore CI S C CI T. Next, observe that A C A U B and 
B C A U B, so that taking the closure of both sides of each of these relations 
yields 

CI A C C1(A U B) 

and 

CI B C C1(A U B) . 

Together these show that (CI A) U (CI B) C C1(A U B). Since CI A and CI B 
are both closed, (CI A) U (CI B) must also be closed and contain A U B. 
Hence we also have 
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C1(A U B) C (CI A) U (CI B) . 

This shows that C1(A U B) = (CI A) U (CI B). 

(f) By (a) we have A n B C A C CI A and A n B C B C CI B, and hence 
A Pi B is a subset of the closed set (CI A) D (CI B). But by definition of 
closure this means that 

ci(A n B) c (ci A) n (ci B) . ■ 

Theorem B12 Let (X, d) be a metric space and suppose A C X. Then 

(a) C1A = AUA'. 

(b) CI A = Int A U Bd A. 

(c) Bd A = Bd A c . 

(d) Int A = CI A - Bd A. 

Proof (a) Assume x G CI A but that x ^ A. We first show that x G A'. Let U 
be any open set containing x. If U D A = 0, then U c is a closed superset of A 
which implies CI A C U c . But this contradicts the assumption that x G U since 
it was assumed that x G CI A. Therefore, since x ^ A, we must have 

(u - {x» n a * 

so that x G A'. This shows that CI A C A U A'. 

Now assume that x G A U A'. If x G A, then obviously x G CI A (since 
A C CI A), so suppose x G A'. We will show that x is contained in any closed 
superset of A. Let F be any closed superset of A not containing x. Then F c is 
an open set containing x and such that (F c - {x}) D A = which says that 
x A', a contradiction. Thus x is contained in any closed superset of A so 
that x G CI A. Since this shows that A U A' C CI A, it follows that A U A' = 
CI A. 

(b) We first suppose that x G CI A but x ^ Bd A. Since x ^ Bd A, there 
exists an open set U containing x such that either U C A or U C A c . If it were 
true that U C A c , then U c would be a closed superset of A (see Example 0.1) 
so that CI A C U° which contradicts the assumption that x G CI A. We must 
therefore have U C A, and hence x G Int A. Since the assumption that x G CI 
A but x ^ Bd A led to the requirement that x G Int A, we must have 



CI A C Int A U Bd A 
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Now assume x E Int A U Bd A, but that x £ CI A. Note Int A C A C CI A 
so that x ^ Int A, and hence it must be true that x G Bd A. However, since 
x ^ CI A, (CI A) c is an open set containing x with the property that (CI A) c D 
A = 0. But this says that x ^ Bd A which contradicts our original assump- 
tion. In other words, we must have Int A U Bd A C CI A, so that 

CI A = Int A U Bd A . 

(c) If x G Bd A and U is any open set containing x, then UflA^0 and 
U n A c * 0. But A = (A c ) c so that we also have U D (A c ) c * 0. Together 
with U (1 A c ^ 0, this shows that x G Bd A c . Reversing the argument shows 

that if x G Bd A c , then x G Bd A. Hence Bd A = Bd A c . 

(d) This will follow from part (b) if we can show that Int A D Bd A = 0. 
Now suppose that x G Int A D Bd A. Then since x G Bd A, it must be true 
that every open set containing x intersects A c . But this contradicts the 
assumption that x G Int A (since by definition there must exist an open set U 
containing x such that U C A), and hence we must have Int A D Bd A = 0. I 

It should be remarked that some authors define Bd A as CI A - Int A so 
that our definition of Bd A follows as a theorem. This fact, along with some 
additional insight, is contained in the following theorem. 

Theorem B13 Let A be a subset of a metric space (X, d), and suppose x G 
X. Then 

(a) x G Int A if and only if some neighborhood of x is a subset of A. 

(b) x G CI A if and only if every neighborhood of x intersects A. 

(c) Bd A = CI A -Int A. 

Proof (a) If x G Int A, then by definition of Int A there exists a neighbor- 
hood U of x such that U C A. On the other hand, if there exists an open set U 
such that x G U C A, then x G Int A. 

(b) By Theorem B 12(a), CI A = A U A'. If x G CI A and x G A, then 
clearly every neighborhood of x intersects A. If x G CI A and x G A', then 
every neighborhood of x also intersects A. Conversely, suppose that every 
neighborhood of x intersects A. Then either x G A or x G A', and hence x G 
A U A' = CI A. 

(c) By definition, x G Bd A if and only if every open set containing x 
contains both points of A and points of A c . In other words, x G Bd A if and 
only if every open set containing x contains points of A but is not a subset of 
A. By parts (a) and (b), this is just Bd A = CI A - Int A. I 
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Example B2 An elementary fact that will be referred to again is the follow- 
ing. Let A be a nonempty set of real numbers that is bounded above. Then by 
the Archimedean axiom, A has a least upper bound b = sup A. Given £ > 0, 
there must exist x G A such that b - s < x < b, for otherwise b - e would be an 
upper bound of A. But this means that any neighborhood of b intersects A, 
and hence b G CI A. Thus b G A if A is closed. / 

Our next example yields an important basic result. 

Example B3 Let X = R with the standard (absolute value) metric, and let 
Q C R be the subset of all rational numbers. In Theorem 0.4 it was shown that 
given any two distinct real numbers there always exists a rational number 
between them. This may also be expressed by stating that any neighborhood 
of any real number always contains a rational number. In other words, we 
haveClQ = R. / 

From Theorems BIO and B 12(a), we might guess that there is a relation- 
ship between sequences and closed sets. This is indeed the case, and our next 
theorem provides a very useful description of the closure of a set. 

Theorem B14 (a) A set A C (X, d) is closed if and only if for every 
sequence {x n } in A that converges, the limit is an element of A. 

(b) If A C (X, d), then x G CI A if and only if there is a sequence {x n } in 
A such that x n -* x. 

Proof (a) Suppose that A is closed, and let x n — > x. Since any neighborhood 
of x must contain all x n for n sufficiently large, it follows from Theorem BIO 
that x G A. 

Conversely, assume that any sequence in A converges to an element of A, 
and let x be any accumulation point of A. We will construct a sequence in A 
that converges to x. To construct such a sequence, choose x n G B(x, 1/n) D A. 
This is possible since x is an accumulation point of A. Then given s > 0, 
choose N > 1/s so that x n G B(x, e) for every n > N. Hence x n — > x so that x G 
A. Theorem BIO then shows that A is closed. 

(b) This is Exercise B.4. I 

If (X, d) is a metric space, then A C X is said to be somewhere dense if 
Int(Cl A) ^ 0. The set A is said to be nowhere dense if it is not somewhere 
dense. If CI A = X, then A is said to be dense in X. 
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Example B4 Let X = R with the usual metric. The set A = [a, b) has closure 
CI A = [a, b], and therefore Int(Cl A) = (a, b) ^ 0. Hence A is somewhere 
dense. Example B3 showed that the set Q is dense in R. Now let A = Z, the set 
of all integers. Z c = R - Z is the union of open sets of the form (n, n + 1) 
where n is an integer, and hence Z is closed. It should also be clear that Z' = 
since there clearly exist deleted neighborhoods of any integer that do not 
contain any other integers. By Theorem B13(a), we also see that Int(Cl Z) = 
Int Z = so that Z is nowhere dense. / 

Theorem B15 A subset A of a metric space (X, d) is dense if and only if 
every open subset U of X contains some point of A. 

Proof Suppose A is dense so that CI A = X. If U C X is open, then the fact 
that any x £ U C CI A implies that every neighborhood of x intersects A 
(Theorem B13). In particular, U is a neighborhood of x so that U D A ^ 0. 
On the other hand, suppose that every open set U intersects A. If x G X, then 
every neighborhood U of x must intersect A so that x G CI A. Since x was 
arbitrary, it must be true that CI A = X. I 

After this topological digression, let us return to sequences of numbers. 
Given a sequence {x n }, we may consider a sequence {n]J of positive integers 

that forms a subset of Z + such that ni < 112 < ■ ■ ■ . The corresponding subset 
{x nk } of {x n } is called a subsequence of {x n }. If {x nk } converges, then its 
limit is called a subsequential limit of {x n }. From the definitions, it should 
be clear that any cluster point of {x n } is the limit of a convergent subse- 
quence. It should also be clear that a sequence {x n } converges to x if and only 
if every subsequence also converges to x (see Exercise B.5). 

Theorem B16 The set of all subsequential limits of a sequence {x n } in a 
metric space (X, d) forms a closed subset of X. 

Proof Let S be the set of all subsequential limits of {x n }. If y is an accumu- 
lation point of S, we must show that y G S (Theorem BIO), and hence that 
some subsequence of {x n } converges to y. Choose ni such that x n , ^ y (why 
can this be done?), and let 6 = d(x np y). Now suppose that ni, n2, . . . , n^-i 
have been chosen. Since y is an accumulation point of S, there exists z G S, 
z ■£ y, such that d(z, y) < 2" k 6. But z G S implies that z is a subsequential 
limit, and hence there exists n^ > n^-i such that d(z, x nk ) < 2" k 6. Therefore, 
for each k = 1 , 2, . . . we have 



d(x nk ,y) < d(x nk , z) + d(z, y) < 2 J " k 6 
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so that the subsequence {x nk } converges to y. I 



Given a sequence {a n } of complex numbers (which can be regarded as 
points in R 2 ), we may define the infinite series (generally called simply a 
series) 

cc 

2«„ =a 1 +a 2 +--- 

as the sequence {s n } where 

n 

If the sequence {s n } converges to a number s, we say that the series 
converges and we write this as 2^°=ia n = s. Note that the number s is the limit 
of a sequence of partial sums. For notational convenience, the lower limit in 
the sum may be taken to be instead of 1, and we will frequently write just 
2a n when there is no danger of ambiguity and the proper limits are under- 
stood. 

The reader should note that 2^°=i a n stands for two very different things. 
On the one hand it is used to stand for the sequence {s n } of partial sums, and 
on the other hand it stands for lim n ^ „ s n . This is a common abuse of notation, 
and the context usually makes it clear which meaning is being used. 

We have seen that any convergent sequence is Cauchy, and in Theorem 

B8 we showed that any Cauchy sequence in R n converges. Thus, a sequence 
in R n converges if and only if it is Cauchy. This is called the Cauchy 
criterion. Since the convergence of a series is defined in terms of the 
sequence of partial sums, we see that the Cauchy criterion may be restated as 
follows. 



Theorem B17 A series of numbers 2a n converges if and only if given e > 0, 
there exists an integer N such that m > n > N implies 



k=n 



< e 



Proof If the series 2a n converges, then the sequence {s k } of partial sums 
s k = 2^ =1 a n converges, and hence {s k } is Cauchy. Conversely, if the sequence 
of partial sums s k is Cauchy, then {s k } converges (Theorem B8). In either 
case, this means that given e > 0, there exists N such that p > q > N implies 
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P 


< e 




k=i 


k-l 




k=q+l 





The result now follows by choosing m = p and n = q + 1. I 

Another useful way of stating Theorem B17 is to say that a series 2a n 
converges if and only if given e > 0, there exists N such that k > N implies 
that lat + ■ ■ ■ + ak + p| < e for all positive integers p = 0, 1, 2, . . . . 

Corollary If 2a n converges, then given e > there exists N such that |a n | < 8 
for all n > N. In other words, if 2a n converges, then lim n ^ «, a n = 0. 

Proof This follows from Theorem B 17 by letting m = n. I 

While this corollary says that a necessary condition for 2a n to converge is 
that a n -* 0, this is not a sufficient condition (see Example B5 below). 

If we have a series of nonnegative real numbers, then each partial sum is 
clearly nonnegative, and the sequence of partial sums forms a non-decreasing 
sequence. Thus, directly from Theorem B9 we have the following. 

Theorem B18 A series of nonnegative real numbers converges if and only if 
its partial sums form a bounded sequence. 

One consequence of this theorem is the next result. 

Theorem B19 Suppose 2a n is a series such that & x > a 2 > • • • > 0. Then 2a n 
converges if and only if 

GO 

^ ^ k a 2 k = CL\ + 2a 2 + 4a 4 + • • • 

converges. 

Proof Let s n = a, + a 2 + • • • + a n and let tk = a, + 2a 2 + • • • + 2 k a 2 k . Since all 
terms in the series 2a n are nonnegative we may write for n < 2 k 

s n < a x + (a 2 + a 3 ) + (a 4 +a 5 +a 6 +a 7 ) + --- + (a 2k + ••• + a^+ij) 

since this is just adding the nonnegative term a 2 k +1 + • • • + a 2 k+i_ 1 to s 2 k ^ s n . 
But {a n } is a decreasing sequence, so noting that the last term in parentheses 
consists of 2 k terms, we have 
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Similarly, if n > 2 k 



we have 



s n > a x + a 2 + (a 3 + a 4 ) + • ■ ■ + (« 2 t-i +1 + • •• + a 2 * ) 
> (l/2)aj + a 2 + 2a 4 + • • • + 2^ _1 a 2i . 



-(1/2)^ . 



We have now shown that n < 2 k implies s n < tk, and n > 2 k 



implies 2s n > tk. 



Thus the sequences {s n } and {t k} are either both bounded or both unbounded. 
Together with Theorem B 18, this completes the proof. I 

The interesting aspect of this theorem is that the convergence of 2a n is 
determined by the convergence of a rather "sparse" subsequence. 

Example B5 Let us show that the series 2n~ p converges if p > 1 and 
diverges if p < 1. Indeed, suppose p > 0. Then by Theorem B19 we consider 
the series 



By the corollary to Theorem B17, we must have 1 - p < so that p > 1. In this 
case, 2 2 k(1_p) = 2(2 1_p ) k is a geometric series which converges as in 
Example Bl, and hence Theorem B19 shows that 2n~ p converges for p > 1. If 
p < 0, then 2 n" p diverges by the corollary to Theorem B17. / 

If we are given a series 2a n , we could rearrange the terms in this series to 
obtain a new series 2a' n . Formally, we define this rearrangement by letting 
{k n } be a sequence in which every positive integer appears exactly once. In 

other words, {k n } is a one-to-one mapping from Z + onto Z + . If we now define 
a 'n = ak n for n = 1, 2, . . . , then the corresponding series 2a' n is called a 
rearrangement of the series 2a n . 

For each of the series 2a n and 2a' n , we form the respective sequences of 
partial sums {sk} and {s'k}. Since these sequences are clearly different in 
general, it is not generally the case that they both converge to the same limit. 
While we will not treat this problem in any detail, there is one special case 
that we will need. This will be given as a corollary to the following theorem. 

A series 2a n is said to converge absolutely if the series 2|a n | converges. 



00 



X 



^ 2* • 2~ kp = ^ 2 k(l ~ p) 

k=0 k=0 



Theorem B20 If 2a n converges absolutely, then 2a n converges. 
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Proof Note |2f= n a k |<2f= n |a k | and apply Theorem B 17. I 

Corollary If 2a n converges absolutely, then every rearrangement of 2a n 
converges to the same sum. 

Proof Let 2a' n be a rearrangement with partial sums s' k . Since 2a n con- 
verges absolutely, we may apply Theorem B17 to conclude that for every e > 
there exists N such that m > n > N implies 



Using the notation of the discussion preceding the theorem, we let p G Z + be 
such that the integers 1, . . . , N are contained in the collection kj , . . . , k p 
(note that we must have p > N). If for any n > p we now form the difference 
s n - s' n , then the numbers a,, . . . , a^ will cancel out (as may some other num- 
bers if p > N) and hence, since (*) applies to all m > n > N, we are left with 
|s n - s'nl < e. This shows that {s n } and {s' n } both converge to the same sum 
(since if s n -* s, then |s' n - s| < |s' n - s n | + |s n — s| < 2e which implies that 
s' n -* s also). I 

We remark that Theorems B17 and B20 apply equally well to any com- 
plete normed space if we replace the absolute value by the appropriate norm. 

Before presenting any examples of series, we first compute the limits of 
some commonly occurring sequences of real numbers. 

Theorem B21 (a) If p > 0, then lim n ^ „ 1/n? = 0. 

(b) If p > 0, then lim n ^ „ p 1/n = 1. 

(c) lim„^ ao n 1/n =l. 

(d) If p > and r is real, then lim n ^ „ n r /(l + p) n = 0. 

(e) If |x| < 1 , then lim n ^ «, x n = 0. 

Proof (a) Given e > 0, we seek an N such that n > N implies l/n p < e. Then 

choose N>(l/e) 1/p . 

(b) If p = 1, there is nothing to prove. For p > 1, define x n = p 1/n - 1 > 
so that by the binomial theorem (Example 0.7) we have 



m 




(*) 



p = (1 + x n ) n > 1 + nx ; 



n 



Thus < x n < (p - l)/n so that lim x n = 0, and hence lim p 1/n = 1. If < p < 1, 
we define y n = (l/p) 1/n - 1 >0. Then 
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P = (1 + Yn) n ^ 1 + ny n 

so that y n — * 0, and hence we again have lim p 1/n = 1. 

(c) Let x n = n 1/n - 1 > 0, so that using only the quadratic term in the 
binomial expansion yields 

/t x« (n\ 2 n(n-l) 2 

Thus (for n > 2) we have < x n < [2/(n - 1)] 1/2 so that x n -* 0. Therefore 
lim n 1/n = 1. 

(d) Let k be any integer > such that k > r. Choosing the kth term in the 
binomial expansion we have (since p > 0) 

(l + p) n Jn) pk = n(n-l)---(n-(k-l)) pk 

If we let n > 2k, then k < n/2 so that n > n/2, n - 1 > n/2, . . . , n - (k - 1) > n/2 
and hence (1 + p) n > (n/2) k p k /k! . Thus (for n > 2k) 

0<^—<^n r - k . 

a+pf P k 

Since r - k < 0, it follows that n r_k -* by (a). 

(e) Choose r = in (d). I 

Corollary If N > is any finite integer, then lim n ^ «, (n N ) 1/n = 1 . 
Proof (n N ) 1/n = n N/n = (n 1/n ) N so that (by Theorem B2(c)) 

lim(n 1/n ) N = (limn 1/n ) N = 1 N = 1 . I 



Example B6 The geometric series 2iT=o xn converges for |x| < 1 and 
diverges for |x| > 1. Indeed, from elementary algebra we see that (for x ^ 1) 

i 2 n A — x 

\ + x + x +--- + X = . 

l-x 

If |x| < 1, we clearly have lim x n+1 = 0, and hence 
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If |x| > 1, then |x| n+1 -*■ o° and the series diverges. In the case that |x| = 1, we 
see that x n -f* so the series diverges. / 

Let {x n } be a sequence of real numbers. In general this sequence may 
have many cluster points, in which case it will not converge to a definite limit. 
Now define the sequences 

U n = SUp k > n X k 

and 

L n = inf k > n x k . 

Note that U n is a decreasing sequence and L n is an increasing sequence. If a is 
the largest cluster point of {x n }, then clearly the U n will approach a as n 
increases. Furthermore, no matter how large n gets, U n will always remain > 
a. Similarly, if (3 is the smallest cluster point of {x n }, then all the L n must be 
< (3. This situation is represented schematically in the figure below. 

L, L 2 ••• L n U n ••• U 2 Ui 

— I 1 h \ 1 1 R 

|3 a 

Let U = inf n U n . By Theorem B4 and the remarks following it we see that 
U n converges to U. The limit U is called the upper limit (or limit superior) 

of the sequence {x n } and will be denoted by x. In other words, 

x = inf n sup k > n x k = lim n ^ oo sup k > n x k . 

The upper limit is frequently also written as lim sup x n . 

Similarly, L n converges to L = sup n L n . The limit L is called the lower 
limit (or limit inferior) of the sequence {x n }, and is denoted by x. Thus 

x = sup n inf k > n x k = lim n ^ooinf k > n x k 

which is also written as lim inf x n . Note that either or both x and x could be 

+00. 

Theorem B22 If x n < y n for all n greater than or equal to some fixed N, then 
lim sup x n < lim sup y n and lim inf x n < lim inf y n . 
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Proof This is Exercise B.6. I 

We have already remarked that in general a sequence may have many (or 
no) cluster points, and hence will not converge. However, suppose {x n } con- 
verges to x, and let lim U n = U. We claim that x = U. 

To see this, we simply use the definitions involved. Given e > 0, we may 
choose N such that for all n > N we have both |x - x n | < e and |U - U n | < e. 
Since Un = sup k >N *k , we see that given this e, there exists k > N such that 
Un - s < Xk or Un - x k < e. But then we have 

|U - x| < |U - U N I + IU N - x k | + |x k - x| < 3s 

which proves that U = x. In an exactly analogous way, it is easy to prove that 
L = lim L n = x (see Exercise B.7). We have therefore shown that x n -> x 
implies lim sup x n = lim inf x n = x. That the converse of this statement is also 
true is given in the next theorem. It should be clear however, that all but a 
finite number of terms in the sequence {x n } will be caught between U and L, 
and hence if U = L it must be true that x n -* x = U = L. 

Theorem B23 A real- valued sequence {x n } converges to the number x if 
and only if lim sup x n = lim inf x n = x. 

Proof Let U n = sup k > n x k and L n = infk> n x k , and first suppose that lim U n = 
lim L n = x. Given e > 0, there exists N such that |U n - x| < e for all n > N, and 
there exists M such that |L n - x| < e for all n > M. These may be written as 
(see Example 0.6) x-s<U n <x + s for all n > N, and x-e<L n <x + e for 
all n > M. But from the definitions of U n and L n we know that x n < U n and 
L n < x n . Hence x n < x + 8 for all n > N and x - e < x n for all n > M. Therefore 
|x n - x| < e for all n > max{N, M} so that x n -* x. 

The converse was shown in the discussion preceding the theorem. I 

Define S to be the set of all cluster points of {x n }. Since any cluster point 
is the limit of some subsequence, it follows that S is just the set of all 
subsequential limits of {x n }. From the figure above, we suspect that sup S = x 
and inf S = x. It is not hard to prove that this is indeed the case. 

Theorem B24 Let {x n } be a sequence of real numbers and let S, x and x be 
defined as above. Then sup S = x and inf S = x. 

Proof This is Exercise B.8. I 
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Example B7 Let x n = (-l) n /(l + 1/n). Then it should be clear that we have 
lim sup x n = 1 and lim inf x n = -1. / 

Our next theorem will be very useful in proving several tests for the con- 
vergence of series. 

Theorem B25 Let {x n } be a sequence of real numbers, let S be the set of all 
subsequential limits of {x n }, and let x = lim sup x n and x = lim inf x n . Then 

(a) xGS. 

(b) If r > x, then there exists N such that n > N implies x n < r. 

(c) x is unique. 

Of course, the analogous results hold for x as well. 

Proof We will show only the results for x, and leave the case of x to the 
reader. 

(a) Since S (the set of all subsequential limits) lies in the extended number 
system, we must consider three possibilities. If -oo < x < +oo, then S is 
bounded above so that at least one subsequential limit exists. Then the set S is 
closed (by Theorem B 16), and hence x = sup S G S (see Example B2). 

If x = +oo, then S is not bounded above so that {x n } is not bounded above. 
Thus there exists a subsequence {x nk } such that x nk -» +oo. But then +oo G S 
so that xGS. 

If x = -oo, then there is no finite subsequential limit (since x is the least 
upper bound of the set of such limits), and hence S consists solely of the 
element -oo. This means that given any real M, x n > M for at most a finite 
number of indices n so that x n -* -oo, and hence x = -oo G S. 

(b) If there existed an r > x such that x n > r for an infinite number of n, 
then there would be a subsequential limit x' of {x n } such that x' > r > x. This 
contradicts the definition of x. 

(c) Let x and y be distinct numbers that satisfy (a) and (b), and suppose 
x < y. Let r be any number such that x < r < y (that such an r exists was shown 
in Theorem 0.4). Since x satisfies (b), there exists N such that x n < r for all n > 
N. But then y can not possibly satisfy (a). I 

We now have the background to prove three basic tests for the conver- 
gence of series. 

Theorem B26 (a) (Comparison test) If 2b n converges, and if |a n | < b n for 

n > N (N fixed), then 2a n converges. If 2c n diverges, and if a n > c n > for 
n > N , then 2a n diverges. 

(b) (Root test) Given the series 2a n , let a = lim sup|a n | 1/n . If a < 1, then 
2a n converges, and if a > 1, then 2a n diverges. 
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(c) (Ratio test) The series 2a n converges if lim sup |a n+ i/a n | < 1, and 
diverges if |a n+ i/a n | > 1 for n > N (N fixed). 

Proof (a) Given e > 0, there exists N > N such that m > n > N implies that 
|2l?=n b k | < 8 (Theorem B 17). Hence 2a n converges since 



k=n 



k=n 



k=n 



2*. 

k=n 



< e 



By what has just been shown, we see that if < c n < a n and 2a n converges, 
then 2c n must also converge. But the contrapositive of this statement is then 
that if 2c n diverges, then so must 2a n . 

(b) First note that a > since |a n | 1/n > 0. Now suppose that a < 1. By 
Theorem B25(b), for any r such that a < r < 1, there exists N such that n > N 
implies |a n | 1/n < r, and thus |a n | < r n . But 2r n converges (Example B5) so that 
2a n must also converge by the comparison test. 

If a > 1, then by Theorems B22(a) and B 14(b), there must exist a sequence 
{n k } such that |a n J 1/nk -*■ a . But this means that |a n | > 1 for infinitely many n 
so that a n ->» and 2a n does not converge (corollary to Theorem B 17). 

(c) If lim sup|a n+ i/a n | < 1 then, by Theorem B25(b), we can find a number 
r < 1 and an integer N such that n > N implies |a n+ i/a n | < r. We then see that 



\ a N+l 


< r 


a N\ 


a N+2 


< r 


1 2 1 1 

a N+l \ <r \ a N\ 



l N+p 



< r p \a N \ 



Therefore, letting n = N + p we have 

|a„| < r n " N |a N | = r" N |a N |r n 

for n > N, and hence 2a n converges by the comparison test and Example B6. 

If |a n+ il > |a n | for n > N (N fixed), then clearly a n ->* so that 2a n can not 
converge (corollary to Theorem B17). I 



Note that if a = 1 when applying the root test we get no information since, 
for example, 2l/n and 2l/n 2 both have a = 1 (corollary to Theorem B21), but 
the first diverges whereas the second converges (see Example B5). 
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Example B8 Consider the function e x ( = exp x) defined by 



v x x 

e =l+x+ — + •••= > — 

2! ^ 



n=0 



where x may be real or complex. To test for convergence, we observe that 
|a n+ i/a n | = |x/(n+l)| so that 



lim sup 



a 



n+l 



= lim„_ Q0 sup fean 
= lim„ 



k + l 



n + l 



= 



and hence the series converges by the ratio test. 

It is of great use in both mathematics and physics to prove that 



-lim B _Jl + - 



While this can be proved by taking the logarithm of both sides and using 
1' Hospital's rule, we shall follow a direct approach. Let x n = (1 + x/n) n . 
Expanding x n by the binomial theorem we have (for n > 2) 



_ y n! fx 
* n ~ £ k\(n-ky\n 

n{n-\)x 2 n(n-l)(n-2) x 3 x n 
= l + x + — -^- + — -— T- + --- + — 

2! n 2 3! n 3 n n 



If we write 



1 n\ 1 1 n(n-l)(n-2)---(n-(n-l)) 



n n\n n 



n!\ n) 



[i-l 



( n-1 

n 



then 
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X„ = 1 + X + - 



2! 



1 — \x + — 

{ n) 3! 



\ n 



■ 2 ^ 
1 \X 



\ n 



+ ... + . 



1 

jfc! 



fi-i 



l 



+ ... + li-lin- 

n!\ n 



1-- 

n 

2 



1-*^ \x k 



n 



1- 



n 
n-1 

n 



We now treat each x n as an infinite series by defining all terms with k > n to 
be zero, and we consider the difference 



CO 



k=2 ■ 



1- 



1- 



V n 



1-- 

V n 



1- 



k-\ 



(*) 



Applying Theorem B17 to the convergent series e x = 2|x| n /n!, we see that for 
fixed x and e > 0, we can choose an integer m sufficiently large that 



k=m+l 



k\ 



:e/2 



Writing (*) in the form 2k°=2 = 2^=2 + 2k°= m +i and noting that the coef- 



ficient of x k in the (second) sum is > but < 1/k!, we obtain (for n > m) 



m * 
k=2 K - 



1- 



i-I 

n 



1-- 

n 



n 



x k \ +s/2 



Since the sum in this expression consists of a finite number of terms, each of 
which approaches as n -> we may choose an N > such that the sum is 
less than e/2 for n > N. Therefore, for n > N we have |e x - x n | < e which 
proves that x n -*■ e x . / 



Exercises 

1. Prove Theorem B2(b). 

2. Let {x n } and {y n } be sequences of real numbers such that x n < y n for all 
n > N where N is fixed. If x n -* x and y n -*■ y, prove that x < y. 

3. If A is a subset of a metric space X and x£X, prove that x G Ext A if 
and only if x has some neighborhood disjoint from A. 
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4. Prove Theorem B 14(b). 

5. Prove that any cluster point is the limit of a subsequence. 

6. Prove Theorem B22. 

7. If {x n } is a sequence of real numbers converging to x, and L n = infk> n x^ 
converges to L, show that x = L. 

8. Prove Theorem B24. 

9. Let iR denote the extended real number system, and let f: (a, b) C R -> R. 
Define 

lim x ^ y sup f(x) = inf 6 >osupo<| x -y| < 5 f(x) 
and suppose that lim x ^ y f(x) = L (i.e., lim x ^ y f(x) exists). Show 

lim x ^y sup f(x) = L . 
[Hint: Let Ss = sup| x _ y | < 5 f(x) and define S = infg Ss . Then note that 

IS - L| < |S - S 6 | + |S 8 - f(x)| + |f(x) - L| .] 

10. (a) Let {x n } be a Cauchy sequence in R n , and assume that {x n } has a 
cluster point c. Prove that {x n } converges to c. 

(b) Using this result, prove that any Cauchy sequence in R n converges to 
a point of R n . 

11. Show that Bd A = CI A n CI A c . 

12. If U C (X, d) is open and A C X is dense, show that CI U = C1(U D A). 

13. If {x n } is a Cauchy sequence with a convergent subsequence {x nk }, show 
that {x n } converges. 
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In order to avoid having to define a general topological space, we shall phrase 
this appendix in terms of metric spaces. However, the reader should be aware 
that this material is far more general than we are presenting it. We assume that 
the reader has studied Appendix A. 

In elementary analysis and geometry, one thinks of a curve as a collection 
of points whose coordinates are continuous functions of a real variable t. For 
example, a curve in the plane R 2 may be specified by giving its coordinates 
(x = f(t), y = g(t)) where f and g are continuous functions of the parameter t. If 
we require that the curve join two points p and q, then the parameter can 
always be adjusted so that t = at p and t = 1 at q. Thus we see that the curve 
is described by a continuous mapping from the unit interval I = [0, 1] into the 
plane. 

Let X be a metric space, and let I = [0, 1] be a subspace of R with the 
usual metric. We define a path in X, joining two points p and q of X, to be a 
continuous mapping f: I — > X such that f(0) = p and f(l) = q. This path will be 
said to lie in a subset A C X if f(I) C A. It is important to realize that the path 
is the mapping f, and not the set of image points f(I). The space X is said to be 
path connected if for every p,q£X there exists a path in X joining p and q. 
If A C X, then A is path connected if every pair of points of A can be joined 
by a path in A. (We should note that what we have called path connected is 
sometimes called arcwise connected.) 
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Let us consider for a moment the space R n . If x n Xj G R n , then we let x;xj 

denote the closed line segment joining X; and Xj. A subset A C R n is said to be 
polygonally connected if given any two points p, q G A there are points x = 
p, x b x 2 , . . . , x m = q in A such that U^iXi-i^i C A. 




Just because a subset of R n is path connected does not mean that it is 
polygonally connected. For example, the unit circle in R is path connected 
since it is actually a path itself, but it is not polygonally connected. 

Example CI The space R n is path connected. Indeed, if p G R n has coordi- 
nates (x , . . . , x n ) and q G R n has coordinates (y 1 , . . . , y n ), then we define the 

mapping f: I — > IR n by f(t) = (f J (t), . . . , f n (t)) where f(t) = (1 - Ox 1 + ty 1 . 
This mapping is clearly continuous and satisfies f(0) = p and f(l) = q. Thus f 
is a path joining the arbitrary points p and q of R n , and hence R n is path con- 
nected. / 

The following is a simple consequence of Theorem A5 that we shall need 
for our main result (i.e., Theorem C2). 

Theorem CI Let f: (X„ d,) (X 2 , d 2 ) and g: (X 2 , d 2 ) (X 3 , d 3 ) both be 
continuous functions. Then g ° f: (X b d[) -* (X3, d 3 ) is a continuous function. 

Proof If U C X3 is open, then the continuity of g shows that g~'(U) C X 2 is 
open. Therefore (g ° f) _1 (U) = (f _1 g"')(U) = f'^g'^U)) is open by the con- 
tinuity of f. I 

Theorem C2 Let f be a continuous mapping from a metric space X onto a 
metric space Y. Then Y is path connected if X is. 
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Proof Let x', y' be any two points of Y. Then (since f is surjective) there 
exist x, y G X such that f(x) = x' and f(y) = y'. Since X is path connected, 
there exists a path g joining x and y such that g(0) = x and g(l) = y. But then 
f o g is a continuous function (Theorem CI) from I into Y such that (f ° g)(0) 
= x' and (f ° g)(l) = y'. In other words, f ° g is a path joining x' and y', and 
hence Y is path connected. I 

It is an obvious corollary of Theorem C2 that if f is a continuous mapping 
from the path connected space X into Y, then f(X) is path connected in Y 
since f maps X onto the subspace f(X). 
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zero 261, 297 
Polynomial equation 261 

solution 261 
Polynomial function 255 



Positive integers 2 
Positive transformation 539 
Power set 7 
Preimage 4 

Primary decomposition theorem 339 
Prime number 25 
Prime polynomial 262 
Principal ideal 60 

generator of 60 
Product 625 

Projection 157, 232, 352, 654 
Pull-back 595 
Push-forward 602 
Pythagorean theorem 95, 103, 622 



Quadratic 

form 471 

diagonal representation 477 

polynomial 47 1 
Quaternions 280 
Quotient 25 
Quotient group 64 
Quotient ring 64 
Quotient space 362 



Raising an index 611 
Rank 

of a bilinear form 465 
of a matrix 135 
Ratio test 717 

Rational canonical form 416 
Rational numbers 2 
Rayleigh quotient 513 
Rearrangement lemma 34 
Reducible representation 333 
Reflexive space 671 
Relation 7 

Relatively prime 28, 263 
Remainder 25 
Remainder theorem 261 
Resolution of the identity 524 
r-forms 554 

Riesz representation theorem 666 
Riesz-Fischer theorem 663 
Right identity 31 
Right inverse 31, 157 
Right zero divisor 163 
Ring 53 

associates 262 

associative 53 

commutative 53 

embedded 282 

extension 282 

homomorphism 56 
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kernel of 59 
isomorphism 59 
with unit element 53 

Ring of sets 4 

r- linear form 544 

Root 261 

multiplicity of 281, 305 

Root test 717 

Row canonical form 125 

Row space 128 

Row-column-equivalent 169 



Scalar 69 

Scalar mapping 301 
Scalar multiplication 68 
Scalar product 94 
Scalar triple product 588 
Schur canonical form 384 
Schur's lemma 335 
Schwartz's inequality 20 

generalized 649 
Second dual 222, 452 
Secular equation 309 
Separable 647, 695 
Sequence 696 

Cauchy 699 

decreasing 699 

increasing 699 

limit of 622, 696 

monotonic 699 

range 697 
Series 708 

rearrangement of 710 
Sesquilinear form 620 
Set 2 

closed 683 

complement of 2 

countable 1 1 

countably infinite 1 1 

disjoint 3 

family of 2 

finite 1 1 

infinite 11 

intersection 3 

open 681 

symmetric difference 4 

uncountable 1 1 

union 2 
Shuffle 563 
Signature 477 

Signed permutation matrix 389 
Similar matrices 184, 245 
Similarity class 329 
Similarity invariants 408 
Similarity transformation 184, 245 



Simple root 305 

Smith canonical form 400 

Solution set 116 

Space of linear functionals 222 

Space of linear transformations 220 

Spectral decomposition 524 

Spectral theorem 525 

Spectrum 346 

degenerate 346 
Square root 15 
Standard basis 79 
Standard inner product 99, 620 
Standard orientation 608 
Subdeterminant 185 
Subgroup 33 

index of 62 

normal 62 
Submatrix 185, 193, 209 
Subsequence 707 
Subsequential limit 707 
Subset 2 

proper 2 
Subspace 72, 649 

closed 649 

generated by 72, 660 

intersection of 86 

invariant 243, 329 

irreducible 518 

null 618 

of a metric space 684 

proper 72 

spacelike 618 

spanned by 72 

sum of 74, 86 

time like 618 

trivial 72 
Summation convention 545 
Sup norm 625, 629, 633 
Superdiagonal 155, 370 
Superset 2 
Supremum 8 
Surjective 5 

Sylvester's theorem 478 
Symmetric group 37 
Symmetrizing mapping 556 



T-cyclic subspace 432 

generated by 432 
T-invariant subspace 243 
Tensor 545 

antisymmetric 553, 554 

classical law of transformation 550 

components 545, 547 

contraction 552 

contravariant order 545 
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covariant order 545 
rank 545 

skew-symmetric 553 

symmetric 553, 554 

trace 552 

type 545 
Tensor algebra 574 
Tensor product 462, 464, 547, 580 
Total 660 
Total ordering 8 
Trace 155 

Transcendental number 17 
Transition matrix 243 

orthogonal 249 
Transpose 

of a linear transformation 459 

of a matrix 153 
Transpositions 44 
Triangle inequality 101 
Triangular form theorem 367, 376 
Two-sided inverse 157 



Volume forms 607 
equivalent 607 



Wedge product 462, 563 
Well-defined 4 
Well-ordered 17 
Weyl's formula 536 



Zero divisor 57 
Zero mapping 219 
Zero matrix 148 
Zorn's lemma 9 



Uniformly continuous 623 

Unique factorization theorem 266 

Unit (of a ring) 262 

Unit cube 593 

Unit matrix 392 

Unit vector 99 

Unitarily similar 385, 515 

Unitary 183, 383, 499, 502, 678 

Unitary space 99, 508 

Unknowns 115 

Upper limit 713 



Vandermonde matrix 195 
Vector 69 

length of 99 

lightlike 614 

norm of 99 

spacelike 614 

time like 614 
Vector multiplication 227 
Vector space 68 

complex 69 

dimension of 77, 83 

generated by 578 

infinite-dimensional 640 

isometric 113 

normed 101 

ordinary Euclidean 613 

pseudo-Euclidean 613 

real 69 

singular 613 
Vector space homomorphism 79 



